Why do giraffes have long necks? Why are cheetahs so fast? Without knowing it, they have been playing out a type of improvement process. Or rather, some clever people have spotted what was going on in the natural world and come up with an algorithmic approach that works in a similar way.
Evolutionary algorithms are a class of heuristic focused improvement techniques, suitable for optimization problems without an analytical or ‘easy’ solution. Loosely inspired by the biological survival of the fittest mechanisms that result in long giraffe necks and fast cheetahs, populations of potential solutions are created, selected for reproduction and generate new populations of solutions. The idea is that the selection process favours individuals with promising characteristics and that these individuals will lead to a population stronger on average than the previous.
But first, how do we know how ‘good’ an individual is? There needs to be a way to assign an individual a number quantifying how well they accomplish a problem. This might be the height of a giraffe – a taller giraffe will in general have a better chance of survival by being able to reach higher branches of leaves. It could be the speed of a cheetah; faster cheetahs are more likely to have successful hunts. Whatever the feature might be, there needs to be a function that we can pass an individual and receive a number summarizing how good that individual is. This is what is known as a fitness function.
Once we know a fitness function, it may impact the way we encode an individual. This might sound a bit strange. What does encoding mean in this context? We need a way of representing an individual that will allow us to compute it’s fitness and to – in short- explain that individual in full. For something like a solution path, instead of storing the path as it would appear on a screen, we would need to find a way of storing it in a way we can manipulate, such as a string of directions.
At a larger scale, the representation of individuals also causes efficiency concerns and the problem can become: what is the way to represent individuals using the least amount of memory OR to allow for quickest computation of its fitness OR that will mean fitness is closest to true value.
The choice of fitness function impacts on how individuals are represented. There needs to be a way to resolve the fitness function using just the representation.
The idea is that instead of having one solution at a time, we have many. This set of potential solutions forms the population. The population is then selected and combined in such a way that a new population built using the previous is stronger.
Main steps to create and iteratively improve this population involve:
With 2 – 5 happening recursively to keep producing new generations.
First of all, there needs to be a population. The starting population is usually created randomly. This first population is unsightly; a bunch of oddballs and misfits that, while possible solutions, have not been beautifully optimized and crafted to solve the problem at hand. But, by chance, there will be individuals with at least some desirable features.
Now that there is a population, certain individuals will be chosen as parents to reproduce individuals for the next generation. But how to choose?
In general, fitter individuals should have a higher probability of being chosen. This mimics natural selection, whereby fitter individuals at solving a problem at hand have a more likely chance to survive to reproduce.
Common ways of selecting a parent include:
- Fitness Proportional (‘Roulette’): Parent selected with probability proportional to their fitnesses. Think a roulette wheel with larger wedges of the wheel dedicated to the more fit individuals
- Rank: Similar to roulette, but instead the individuals are ranked by fitness and their index is used for the probability proportions. This is better to keep diversity in cases where the fitness may be skewed at either end.
- Tournament: A random subset of individuals are selected (participants of the tournament) and the most fit individual in the subset chosen
Once 2 parents are selected, they reproduce to create a new ‘child’ individual. This child individual takes characteristics from both of the parents. Exactly how this crossover happens relies on individual representation, but usually involves cutting and sticking different parts of the parents together.
- Single point crossover: Pick a random point in the middle of the representation. Append the second parents’ representation to the first at this point.
- N point crossover: Extension of single point crossover to N points selected within the representation, with sections of each parent alternating between the points.
It would be a shame to lose that really tall blue giraffe. The selection process only makes it likely that the fittest individuals are chosen for selection and will pass on their attributes. But as a fail-safe, to make sure that the best of the bunch are always retained, the fittest few individuals are copied without alterations into the next population. It’s a nice safety net to make sure the next population has at least the best elites from the previous.
Can you see a problem with the steps so far? Sure the population is getting better on average. But it’s not making anything new. We would so far be capped at the best possible fitness combination of the starting individuals. That is where mutation comes in. It throws a curveball. It dares to try the different. Random (usually small) modifications are made to some individuals in the new population with the hopes that some modifications on some of the individuals are by chance going to make it fitter. We can choose to allow mutation of the elite or to keep them as they were.
And all together…
And those are all the steps. Initialisation is done once, before steps 2 – 5 are repeated iteratively, making new populations to select from once again.
Now we’ve got the general idea, it’s time to tackle evolutionary algorithms from a coding perspective. Proceed to the next blog, where we’ll tackle password cracking by creating our very own evolutionary algorithm framework in Python.