# How to make virtual organisms learn using neural networks?

## Question:

I’m making a simple learning simulation, where there are multiple organisms on screen. They’re supposed to learn how to eat, using their simple neural networks. They have 4 neurons, and each neuron activates movement in one direction (it’s a 2D plane viewed from the bird’s perspective, so there are only four directions, thus, four outputs are required). Their only input are four “eyes”. Only one eye can be active at the time, and it basically serves as a pointer to the nearest object (either a green food block, or another organism).

Thus, the network can be imagined like this:

And an organism looks like this (both in theory and the actual simulation, where they really are red blocks with their eyes around them):

And this is how it all looks (this is an old version, where eyes still didn’t work, but it’s similar):

Now that I have described my general idea, let me get to the heart of the problem…

1. Initialization|
First, I create some organisms and food. Then, all the 16 weights in their neural networks are set to random values, like this: weight = random.random()*threshold*2. Threshold is a global value that describes how much input each neuron needs to get in order to activate (“fire”). It is usually set to 1.

2. Learning|
By default, the weights in the neural networks are lowered by 1% each step. But, if some organism actually manages to eat something, the connection between the last active input and output is strengthened.

But, there is a big problem. I think that this isn’t a good approach, because they don’t actually learn anything! Only those that had their initial weights randomly set to be beneficial will get a chance of eating something, and then only them will have their weights strengthened! What about those that had their connections set up badly? They’ll just die, not learn.

How do I avoid this? The only solution that comes to mind is to randomly increase/decrease the weights, so that eventually, someone will get the right configuration, and eat something by chance. But I find this solution to be very crude and ugly. Do you have any ideas?

EDIT:
Thank you for your answers! Every single one of them was very useful, some were just more relevant. I have decided to use the following approach:

1. Set all the weights to random numbers.
2. Decrease the weights over time.
3. Sometimes randomly increase or decrease a weight. The more successful the unit is, the less it’s weights will get changed. NEW
4. When an organism eats something, increase the weight between the corresponding input and the output.

As mentioned by Mika Fischer, this sounds similar to artificial life problems, so that’s one avenue you could look at.

It also sounds a bit like you’re trying to reinvent Reinforcement Learning. I would recommend reading through Reinforcement Learning: An Introduction, which is freely available in HTML form at that website, or purchasable in dead tree format. Example code and solutions are also provided on that page.

Use of neural networks (and other function approximators) and planning techniques is discussed later in the book, so don’t get discouraged if the initial stuff seems too basic or non-applicable to your problem.

This is similar to issues with trying to find a global minimum, where it’s easy to get stuck in a local minimum. Consider trying to find the global minimum for the profile below: you place the ball in different places and follow it as it rolls down the hill to the minimum, but depending on where you place it, you may get stuck in a local dip.

That is, in complicated situations, you can’t always get to the best solution from all starting points using small optimizing increments. The general solutions to this are to fluctuate the parameters (i.e., weights, in this case) more vigorously (and usually reduce the size of the fluctuations as you progress the simulation — like in simulated annealing), or just realize that a bunch of the starting points aren’t going to go anywhere interesting.

How do you want it to learn? You don’t like the fact that randomly seeded organisms either die off or prosper, but the only time you provide feedback into your organism is if they randomly get food.

Let’s model this as hot and cold. Currently, everything feeds back “cold” except when the organism is right on top of food. So the only opportunity to learn is accidentally running over food. You can tighten this loop to provide more continuous feedback if you desire. Feedback warmer if there is movement toward food, cold if moving away.

Now, the downside of this is that there is no input for anything else. You’ve only got a food-seeker learning technique. If you want your organisms to find a balance between hunger and something else (say, overcrowding avoidance, mating, etc), the whole mechanism probably needs to be re-thought.

There are several algorithms that can be used to optimize the weights in a neural network, the most common of which is the backpropogation algorithm.

From reading your question I gather that you are trying to build neural network bots that will search for food. The way to achieve this with backpropogation would be to have an initial learning period, where the weights are initially randomly set (as you’re doing) and gradually refined using the backpropogation algorithm until they reach a performance level you’re happy with. At that point you can stop them learning and allow them to frolic freely in flatland.

However I think there might be a few issues with your network design. Firstly, if there is only 1 eye active at any time, it would make more sense to have just one input node and keep track of orientation some other way (if I’m understanding that correctly). Simply, if there is only one active eye and four possible actions (forward, back, left, right) then the inputs from the inactive eyes (presumably zero) would have no bearing on the output decision, in fact I suspect the weights for each input to all outputs would converge, essentially duplicating the same function. Moreover, it needlessly increases the complexity of the network and increases the learning time. Secondly, you don’t need that many output neurons to represent all possible actions. As you have it described there, your output would be {1,0,0,0} = right, {0,1,0,0} = left and so on. Depending on the type of neuron modeled, this can be done with 2 or even 1 output neuron. If using a binary neuron (each output is either 1 or 0), then do something like {0,0} = back, {1,1} = forward, {1,0} = left, {0,1} = right. Using a sigmoidal function neuron (the output can be a real number from 0..1), you could do {0} = back, {0.33} = left, {0.66} = right, {1} = forward.

I think a more complex example of what you’re doing is presented by Polyworld.

You can also see the Google Tech Talks presentation from 2007: http://www.youtube.com/watch?v=_m97_kL4ox0

However, the fundamental idea is to take an evolutionary approach within your system: use small random mutations combined with genetic cross-over (as the main form of diversification) and select individuals which are “better” suited to survive the environment.

I can see a bunch of potential problems.

First and foremost, I’m not clear about the algorithm that updates your weights. I like the 1% decrease as a concept– it looks like you’re trying to discount distant memories, which is good in principle– but the rest probably isn’t sufficient. You need to look at some of the standard update algorithms like backpropagation, but that’s just a start, because….

…You’re only giving your network credit for the last stage of eating the food. It doesn’t seem like there’s any direct mechanism for getting your network incrementally closer to the food, or to clumps of food. Even taking the directionality of the eyes at face value, your eyes are very simple, and there’s not much long term memory.

Also, if your network diagram is accurate, it’s probably not sufficient. You really want to have a hidden layer (at least one) between the sensors and the actuators, if you use something related to backpropagation. There are detailed mathematics behind that statement, but it boils down to, “The hidden layers will allow good solutions of more problems.”

Now, notice that a lot of my comments are talking about the architecture of the network, but only in general terms without saying concretely, “This will work,” or “that will work.” That’s because I don’t know either (although I think Kwatford’s suggestion of reinforcement learning is a very good one.) Sometimes, you can evolve the network parameters as well as the network instances. One such technique is Neuroevolution of Augmenting Topologies, or “NEAT”. Might be worth a look.

Categories: questions
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.