Artificial intelligence learns to play a platformer
Daan de Groot – 500780875
Table of contents
1. NeuroEvolution of Augmenting Topologies
2. Individual view
3. Saving data
4. Training the algorithm
4.1 Final result
5. Future work
I have always been fascinated by the rise of artificial intelligence and the technology behind it. In earlier projects I always thought that I was making artificial intelligence-like features, but the program never learned something. Most of the time it was just a switch case that would define what a creature’s actions would be. Later on, I started to learn about machine learning algorithms, I was amazed by what some of these algorithms could achieve. I always understood the concept of a machine learning algorithm, but not how to build and train one. Now that I had the opportunity to do some research, I finally took the time to learn how to create one.
1. NeuroEvolution of Augmenting Topologies
A NeuroEvolution of Augmenting Topologies or neat for short, is an evolutionary algorithm that creates an artificial neural network. What is so special about an algorithm like neat is that it not only changes the weights when it is learning, but it will also change the neural network structure itself. For example a algorithm like Feed Forward would have a set amount of Input, hidden and output nodes (see figure 1) and it could just change the values of the connections. The connections are the objects that connect one node to another so that they can communicate with each other. In this case my neat algorithm has a set amount of input and output nodes. But It can create new hidden nodes and new connections between different nodes.
When the application starts a new neat class will be created(see figure 2). If the generation is higher than 0 the individuals will evolve. In the evolve class there are a variety of methods(see figure 3). Further on in this article I will explain what each method does. For now, it is good to know that this is one loop for a generation of individuals.
The genome of an individual is a collection of all the node-genes and the connection-genes. The node-genes represent input, hidden and output nodes and the connection-genes represent the connections between these nodes. You could say that the genome contains the structure of the neural network from an individual.
A node-gene has an x value, later on this will indicate if it is an input-node, hidden-node or output-node. It also has an innovation number, which is a unique number so that you can identify the node-gene.
A connection- gene has a from and to node-gene so that you know which nodes it connects. It also has a weight that is a value that will be used to calculate an output. It has a boolean that indicates if the connection is active and it will also have an innovation number to identify the connection-gene.
To create a new genome, I make use of the crossover method. The crossover method takes two genomes and creates a new genome. it will then iterate through the lists. If the connection-gene is similar it will choose a random one of the two parents(see figure 5, connection-gene 1). When there is a disjoint connection-gene it will take that gene from that parent(see figure 5, connection-gene 6). A disjoint connection-gene is a gene that is only present in one parent, but the other parent still has some genes left. At last it will get the excess connection-genes if one of the parents has any excess genes(see figure 5, connection-gene 10). Excess genes are only present in one of the two parents and the other parent doesn’t have any genes left. After all the connection-genes have been added, the system checks if the from and to nodes-genes already exist in the nodes list of the new genome. If they don’t it will add the node-genes to the list.
After the Crossover stage the genomes will mutate. There are 5 possible mutations: mutating a link, mutating a node, mutating de weight with a small percentage, mutating the weight with a new random number and mutating the state of the link. All mutations have a probability that they will occur, these are all editable in the neat class.
The first two mutations ensure that the structure of the neural network is being changed. When a link is being mutated it will create a new connection-gene between to nodes. When a node is mutated it will create a new node-gene and add two connection-gene between the newly created node-gene and two other node-gene.
The two mutations for the weights don’t change anything for the structure but change the value of the weights in a connection-gene. This influences output calculations since the weights are getting used to calculate the output, I will give a further explanation on calculating the output in 2. Individual view.
The last mutation is a bit in between since when a connection is disabled it can be enabled later. So it doesn’t have to be a permanent change in the structure of the neural network. When it is disabled it will not be used in the calculations of the output, so it does influences the output just like the mutations of the weights.
In the selection stage the individuals are placed in species. Species are groups of individuals that have some-what the same structure. This is determined by the value from the distance formula(see figure 7) is lower than cp of the representative of a species. cp is an editable variable in the neat class to regulate the amount of species being made. So if it is close enough the individual will be added to the species otherwise a new species will be created and this individual will be the representative of that species.
After all individuals are placed in a species the individuals will be ordered by fitness. fitness is the score that an individual got in one run. Then a certain percentage in this case 20% of the individuals will be killed. These individuals will be deleted, these are the individuals with the lowest score in a species. But some species might have more individuals with a low fitness value, but only 20% are killed. So some individuals with a low fitness value will be kept alive. This is because althoughthe individual did quite badly at this turn. The structure of his neural network could be beneficial later on. This is why it is good to have different species so that certain structures are not thrown away, but get the opportunity to evolve. After the individuals are killed. The species with one or no survivor are going extinct since, so there is more space for new genomes to be created.
2. Individual view
At this point the algorithm works with random inputs, so I needed to find a way to get the right input through the algorithm. Since I used a tilemap I could easily get all the tiles. Then I could pass them to a function in the Manager class. There they will check for each tile in the two tilemaps if there is an object at a certain index. If there is an object it will be a one if there isn’t one it will be zero. Each time the algorithm needs input it will call GetInput(see figure 8) with the two double two-dimensional arrays. Only the input needs to be a one-dimensional double array so that is why the GetInput is called. When everything was ready, I tested the algorithm. And it did nothing. No errors but I wouldn’t get any output either.
The output is being calculated at every hidden and output node. For each of these nodes the calculate function is called(see figure 9) If the connection of a node is enabled it adds the weight of a connection times the output from the “from-node” of the connection to a sum. The sum will be passed on to the ActivationFunction(see figure 10). This function uses the sum in a sigmoid function. So it will always give back an output between zero and one.
After I didn’t get any output back, I did some research and it seemed quite hard to get the algorithm to learn anything with the amount of inputs that I had. Besides that, it was also pretty difficult to debug with over 500 inputs. So I had to come with a new plan for the input. In one of the videos where an artificial intelligence learns to play Mario they used a grid system so that the artificial intelligence had some sort of “eyes”. So I tried remaking something like that for unity. I made a grid and called UpdateInpunt(see figure 11) when I needed to get the input for the algorithm. I just used one tilemap for now. The grid was 5 by 6 so I only had an input size of 30. That was way easier to keep track of what was happening. See video 1 for the result of the test.
3. Saving data
So now that everything seems to work it is time to save the neural networks, so that when I quit the application the progress is saved. The easiest way was to make a new class full of structs. In these structs I could save all the data that I needed to be saved so all the data in individuals like fitness and the genome. When the application is closed the data in the class will be set to Json and saved as a string in playerprefs(see figure 12). When the Application is opened the data will be put back in the savadata variable. When I start a new round, It will call a method which needs an empty list of individuals and returns the list of individuals with all the data from the last round.
4. Training the algorithm
So now it was finally time to train the neural networks. But then I got a lot of bugs. There were some problems with adding node-genes. sometimes it would add one that already existed in the list. First i Check if the item was already in the list with list.contains. But apparently that didn’t work. So I tried to check for the innovationNumber(see figure 14). This seemed to work. But the bot didn’t do much for 11 generations. But then a bot finally finished(see video 2).
But since it took really long to do at least something. I tweaked the algorithm a little bit. It would only make one species so the cp value went down to make it possible to create more species and I made the probability for mutations a bit higher so it would mutate faster. I created a new map so that the bot learned to jump. The problem here was that the collision detection in unity gave up. The bot could just rush through walls. I tried manually setting the collision of the tilemap but it didn’t seem to work. Then I remembered that I can just learn it not to go through walls. I made a dead zone in the walls and when the bot hit it, it would set its fitness value to zero. So after a few generations some bots tried a slower approach and actually jumped over the walls instead of going straight through(see video 3).
4.1 Final result
I wanted it to learn that spikes were bad, but since I didn’t have that much time left and this would completely change the way the bots looked at their environment. Since I would give spikes a certain value and let them be picked up by the UpdateInput method just like empty tiles and ground tiles. I decided that the bots needed to learn to jump a greater distance and jump over a pitfall. It would “die“ just like when it would if it went straight through the walls. It took some training and some more tweaking in the probability values for the mutations. But after 94 generations they all finished the level with the pitfall(see video 4).
It took longer to get the algorithm to work than expected. Especially debugging the algorithm could take several days. But at the end I got all the bots to finish the level. I’m quite proud of that. I also think that I learned a lot about planning on how to build such a big program. Next time I should think more in advance and test smaller parts of the system. If I had done that I probably wouldn’t have to debug so much at the end.
Although it is less than I expected to create in the first place I think it will still look good on my portfolio since this is my first machine learning algorithm.
5. Future Work
I’m definitely going through with this project. I want to make the levels more difficult with traps/ enemy’s so that the bots learn the difference between empty, save and danger tiles.
I also want to make the visuals just a bit better so it looks more polished when I place it on my portfolio.
Last but not least I want to do some more research on debugging methods for machine learning algorithms. Not only would this help for any bug that pops-up when I work on this project, but it will be of big help with on next semester since I will be following the minor: applied artificial intelligence
Chrispresso. (2020, 19 augustus). AI Learns to Play Super Mario Bros! [Video]. YouTube. https://www.youtube.com/watch?v=CI3FRsSAa_U&t=60s
Eggers, F. [Finn Eggers]. (2019, 31 december). AI – NEAT [Video]. YouTube. https://www.youtube.com/playlist?list=PLgomWLYGNl1fcL0o4exBShNeCC5tc6s9C
Heidenreich, H. (2019, 4 januari). Crossover [Foto]. NEAT: An Awesome Approach to NeuroEvolution. https://towardsdatascience.com/neat-an-awesome-approach-to-neuroevolution-3eca5cc7930f
HeidenReich, H. (2019, 4 januari). Genome [Foto]. NEAT: An Awesome Approach to NeuroEvolution. https://towardsdatascience.com/neat-an-awesome-approach-to-neuroevolution-3eca5cc7930f
Imam, A. (2020, 16 juni). Feed-Forward Neural Network (FFNN) [Foto]. Neural Networks (Part 1). https://medium.com/swlh/neural-networks-4b6f719f9d75
Jason Weimann. (2018, 5 februari). Saving & Loading Complex Data in Unity3D [Video]. YouTube. https://www.youtube.com/watch?v=eUSpGUeqYn8&t=419s
Presso, C. (2020, 14 maart). AI Learns To Play Super Mario Bros Using A Genetic Algorithm And Neural Network. Chrispresso – All Things Programming, All Things AI. https://chrispresso.io/AI_Learns_To_Play_SMB_Using_GA_And_NN