This HTML version of Think Complexity, 2nd Edition is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Chapter 11 Evolution

The most important idea in biology, and possibly all of science, is the theory of evolution by natural selection, which claims that new species are created and existing species change due to natural selection. Natural selection is a process in which inherited variations between individuals cause differences in survival and reproduction.

Among people who know something about biology, the theory of evolution is widely regarded as a fact, which is to say that it is consistent with all current observations; it is highly unlikely to be contradicted by future observations; and, if it is revised in the future, the changes will almost certainly leave the central ideas substantially intact.

Nevertheless, many people do not believe in evolution. In a survey run by the Pew Research Center, survey respondents were asked which of the following claims is closer to their view:

Humans and other living things have evolved over time.
Humans and other living things have existed in their present form since the beginning of time.

About 34% of Americans chose the second (see http://thinkcomplex.com/arda).

Even among the ones who believe that living things have evolved, barely more than half believe that the cause of evolution is natural selection. In other words, only a third of Americans believe that the theory of evolution is true.

How is this possible? In my opinion, contributing factors include:

Some people think that there is a conflict between evolution and their religious beliefs. Feeling like they have to reject one, they reject evolution.
Others have been actively misinformed, often by members of the first group, so that much of what they know about evolution is misleading or false. For example, many people think that evolution means humans evolved from monkeys. It doesn’t, and we didn’t.
And many people simply don’t know anything about evolution.

There’s probably not much I can do about the first group, but I think I can help the others. Empirically, the theory of evolution is hard for people to understand. At the same time, it is profoundly simple: for many people, once they understand it, it seems both obvious and irrefutable.

To help people make this transition from confusion to clarity, the most powerful tool I have found is computation. Ideas that are hard to understand in theory can be easy to understand when we see them happening in simulation. That is the goal of this chapter.

The code for this chapter is in chap11.ipynb, which is a Jupyter notebook in the repository for this book. For more information about working with this code, see Section ??.

11.1 Simulating evolution

I start with a simple model that demonstrates a basic form of evolution. According to the theory, the following features are sufficient to produce evolution:

Replicators: We need a population of agents that can reproduce in some way. We’ll start with replicators that make perfect copies of themselves. Later we’ll add imperfect copying, that is, mutation.
Variation: We need variability in the population, that is, differences between individuals.
Differential survival or reproduction: The differences between individuals have to affect their ability to survive or reproduce.

To simulate these features, we’ll define a population of agents that represent individual organisms. Each agent has genetic information, called its genotype, which is the information that gets copied when the agent replicates. In our model¹, a genotype is represented by a sequence of N binary digits (zeros and ones), where N is a parameter we choose.

To generate variation, we create a population with a variety of genotypes; later we will explore mechanisms that create or increase variation.

Finally, to generate differential survival and reproduction, we define a function that maps from each genotype to a fitness, where fitness is a quantity related to the ability of an agent to survive or reproduce.

11.2 Fitness landscape

The function that maps from genotype to fitness is called a fitness landscape. In the landscape metaphor, each genotype corresponds to a location in an N-dimensional space, and fitness corresponds to the “height" of the landscape at that location. For visualizations that might clarify this metaphor, see http://thinkcomplex.com/fit.

In biological terms, the fitness landscape represents information about how the genotype of an organism is related to its physical form and capabilities, called its phenotype, and how the phenotype interacts with its environment.

In the real world, fitness landscapes are complicated, but we don’t need to build a realistic model. To induce evolution, we need some relationship between genotype and fitness, but it turns out that it can be any relationship. To demonstrate this point, we’ll use a totally random fitness landscape.

Here is the definition for a class that represents a fitness landscape:

class FitnessLandscape: def __init__(self, N): self.N = N self.one_values = np.random.random(N) self.zero_values = np.random.random(N) def fitness(self, loc): fs = np.where(loc, self.one_values, self.zero_values) return fs.mean()

The genotype of an agent, which corresponds to its location in the fitness landscape, is represented by a NumPy array of zeros and ones called loc. The fitness of a given genotype is the mean of N fitness contributions, one for each element of loc.

To compute the fitness of a genotype, FitnessLandscape uses two arrays: one_values, which contains the fitness contributions of having a 1 in each element of loc, and zero_values, which contains the fitness contributions of having a 0.

The fitness method uses np.where to select a value from one_values where loc has a 1, and a value from zero_values where loc has a 0.

As an example, suppose N=3 and

one_values = [0.1, 0.2, 0.3] zero_values = [0.4, 0.7, 0.9]

In that case, the fitness of loc = [0, 1, 0] would be the mean of [0.4, 0.2, 0.9], which is 0.5.

11.3 Agents

Next we need agents. Here’s the class definition:

class Agent: def __init__(self, loc, fit_land): self.loc = loc self.fit_land = fit_land self.fitness = fit_land.fitness(self.loc) def copy(self): return Agent(self.loc, self.fit_land)

The attributes of an Agent are:

loc: The location of the Agent in the fitness landscape.
fit_land: A reference to a FitnessLandscape object.
fitness: The fitness of this Agent in the FitnessLandscape, represented as a number between 0 and 1.

Agent provides copy, which copies the genotype exactly. Later, we will see a version that copies with mutation, but mutation is not necessary for evolution.

11.4 Simulation

Now that we have agents and a fitness landscape, I’ll define a class called Simulation that simulates the creation, reproduction, and death of the agents. To avoid getting bogged down, I’ll present a simplified version of the code here; you can see the details in the notebook for this chapter.

Here’s the definition of Simulation:

class Simulation: def __init__(self, fit_land, agents): self.fit_land = fit_land self.agents = agents

The attributes of a Simulation are:

fit_land: A reference to a FitnessLandscape object.
agents: An array of Agent objects.

The most important function in Simulation is step, which simulates one time step:

# class Simulation: def step(self): n = len(self.agents) fits = self.get_fitnesses() # see who dies index_dead = self.choose_dead(fits) num_dead = len(index_dead) # replace the dead with copies of the living replacements = self.choose_replacements(num_dead, fits) self.agents[index_dead] = replacements

step uses three other methods:

get_fitnesses returns an array containing the fitness of each agent.
choose_dead decides which agents die during this time step, and returns an array that contains the indices of the dead agents.
choose_replacements decides which agents reproduce during this time step, invokes copy on each one, and returns an array of new Agent objects.

In this version of the simulation, the number of new agents during each time step equals the number of dead agents, so the number of live agents is constant.

11.5 No differentiation

Before we run the simulation, we have to specify the behavior of choose_dead and choose_replacements. We’ll start with simple versions of these functions that don’t depend on fitness:

# class Simulation def choose_dead(self, fits): n = len(self.agents) is_dead = np.random.random(n) < 0.1 index_dead = np.nonzero(is_dead)[0] return index_dead

In choose_dead, n is the number of agents and is_dead is a boolean array that contains True for the agents who die during this time step. In this version, every agent has the same probability of dying: 0.1. choose_dead uses np.nonzero to find the indices of the non-zero elements of is_dead (True is considered non-zero).

# class Simulation def choose_replacements(self, n, fits): agents = np.random.choice(self.agents, size=n, replace=True) replacements = [agent.copy() for agent in agents] return replacements

In choose_replacements, n is the number of agents who reproduce during this time step. It uses np.random.choice to choose n agents with replacement. Then it invokes copy on each one and returns a list of new Agent objects.

These methods don’t depend on fitness, so this simulation does not have differential survival or reproduction. As a result, we should not expect to see evolution. But how can we tell?

11.6 Evidence of evolution

The most inclusive definition of evolution is a change in the distribution of genotypes in a population. Evolution is an aggregate effect: in other words, individuals don’t evolve; populations do.

In this simulation, genotypes are locations in a high-dimensional space, so it is hard to visualize changes in their distribution. However, if the genotypes change, we expect their fitness to change as well. So we will use changes in the distribution of fitness as evidence of evolution. In particular, we’ll look at the mean and standard deviation of fitness over time.

Before we run the simulation, we have to add an Instrument, which is an object that gets updated after each time step, computes a statistic of interest, or “metric", and stores the result in a sequence we can plot later.

Here is the parent class for all instruments:

class Instrument: def __init__(self): self.metrics = []

And here’s the definition for MeanFitness, an instrument that computes the mean fitness of the population at each time step:

class MeanFitness(Instrument): def update(self, sim): mean = np.nanmean(sim.get_fitnesses()) self.metrics.append(mean)

Now we’re ready to run the simulation. To avoid the effect of random changes in the starting population, we start every simulation with the same set of agents. And to make sure we explore the entire fitness landscape, we start with one agent at every location. Here’s the code that creates the Simulation:

N = 8 fit_land = FitnessLandscape(N) agents = make_all_agents(fit_land, Agent) sim = Simulation(fit_land, agents)

make_all_agents creates one Agent for every location; the implementation is in the notebook for this chapter.

Now we can create and add a MeanFitness instrument, run the simulation, and plot the results:

instrument = MeanFitness() sim.add_instrument(instrument) sim.run()

Simulation keeps a list of Instrument objects. After each time step it invokes update on each Instrument in the list.

Figure 11.1: Mean fitness over time for 10 simulations with no differential survival or reproduction.

Figure ?? shows the result of running this simulation 10 times. The mean fitness of the population drifts up or down at random. Since the distribution of fitness changes over time, we infer that the distribution of phenotypes is also changing. By the most inclusive definition, this random walk is a kind of evolution. But it is not a particularly interesting kind.

In particular, this kind of evolution does not explain how biological species change over time, or how new species appear. The theory of evolution is powerful because it explains phenomena we see in the natural world that seem inexplicable:

Adaptation: Species interact with their environments in ways that seem too complex, too intricate, and too clever to happen by chance. Many features of natural systems seem as if they were designed.
Increasing diversity: Over time the number of species on earth has generally increased (despite several periods of mass extinction).
Increasing complexity: The history of life on earth starts with relatively simple life forms, with more complex organisms appearing later in the geological record.

These are the phenomena we want to explain. So far, our model doesn’t do the job.

11.7 Differential survival

Let’s add one more ingredient, differential survival. Here’s a class that extends Simulation and overrides choose_dead:

class SimWithDiffSurvival(Simulation): def choose_dead(self, fits): n = len(self.agents) is_dead = np.random.random(n) > fits index_dead = np.nonzero(is_dead)[0] return index_dead

Now the probability of survival depends on fitness; in fact, in this version, the probability that an agent survives each time step is its fitness.

Since agents with low fitness are more likely to die, agents with high fitness are more likely to survive long enough to reproduce. Over time we expect the number of low-fitness agents to decrease, and the number of high-fitness agents to increase.

Figure 11.2: Mean fitness over time for 10 simulations with differential survival.

Figure ?? shows mean fitness over time for 10 simulations with differential survival. Mean fitness increases quickly at first, but then levels off.

You can probably figure out why it levels off: if there is only one agent at a particular location and it dies, it leaves that location unoccupied. Without mutation, there is no way for it to be occupied again.

With N=8, this simulation starts with 256 agents occupying all possible locations. Over time, the number of occupied locations decreases; if the simulation runs long enough, eventually all agents will occupy the same location.

So this simulation starts to explain adaptation: increasing fitness means that the species is getting better at surviving in its environment. But the number of occupied locations decreases over time, so this model does not explain increasing diversity at all.

In the notebook for this chapter, you will see the effect of differential reproduction. As you might expect, differential reproduction also increases mean fitness. But without mutation, we still don’t see increasing diversity.

11.8 Mutation

In the simulations so far, we start with the maximum possible diversity — one agent at every location in the landscape — and end with the minimum possible diversity, all agents at one location.

That’s almost the opposite of what happened in the natural world, which apparently began with a single species that branched, over time, into the millions, or possibly billions, of species on Earth today (see http://thinkcomplex.com/bio).

With perfect copying in our model, we never see increasing diversity. But if we add mutation, along with differential survival and reproduction, we get a step closer to understanding evolution in nature.

Here is a class definition that extends Agent and overrides copy:

class Mutant(Agent): def copy(self, prob_mutate=0.05):: if np.random.random() > prob_mutate: loc = self.loc.copy() else: direction = np.random.randint(self.fit_land.N) loc = self.mutate(direction) return Mutant(loc, self.fit_land)

In this model of mutation, every time we call copy, there is a 5% chance of mutation. In case of mutation, we choose a random direction from the current location — that is, a random bit in the genotype — and flip it. Here’s mutate:

def mutate(self, direction): new_loc = self.loc.copy() new_loc[direction] ^= 1 return new_loc

The operator ^= computes “exclusive OR"; with the operand 1, it has the effect of flipping a bit (see http://thinkcomplex.com/xor).

Now that we have mutation, we don’t have to start with an agent at every location. Instead, we can start with the minimum variability: all agents at the same location.

Figure 11.3: Mean fitness over time for 10 simulations with mutation and differential survival and reproduction.

Figure ?? shows the results of 10 simulations with mutation and differential survival and reproduction. In every case, the population evolves toward the location with maximum fitness.

Figure 11.4: Number of occupied locations over time for 10 simulations with mutation and differential survival and reproduction.

To measure diversity in the population, we can plot the number of occupied locations after each time step. Figure ?? shows the results. We start with 100 agents at the same location. As mutations occur, the number of occupied locations increases quickly.

When an agent discovers a high-fitness location, it is more likely to survive and reproduce. Agents at lower-fitness locations eventually die out. Over time, the population migrates through the landscape until most agents are at the location with the highest fitness.

At that point, the system reaches an equilibrium where mutation occupies new locations at the same rate that differential survival causes lower-fitness locations to be left empty.

The number of occupied locations in equilibrium depends on the mutation rate and the degree of differential survival. In these simulations the number of unique occupied locations at any point is typically 5–15.

It is important to remember that the agents in this model don’t move, just as the genotype of an organism doesn’t change. When an agent dies, it can leave a location unoccupied. And when a mutation occurs, it can occupy a new location. As agents disappear from some locations and appear in others, the population migrates across the landscape, like a glider in Game of Life. But organisms don’t evolve; populations do.

11.9 Speciation

The theory of evolution says that natural selection changes existing species and creates new ones. In our model, we have seen changes, but we have not seen a new species. It’s not even clear, in the model, what a new species would look like.

Among species that reproduce sexually, two organisms are considered the same species if they can breed and produce fertile offspring. But the agents in the model don’t reproduce sexually, so this definition doesn’t apply.

Among organisms that reproduce asexually, like bacteria, the definition of species is not as clear-cut. Generally, a population is considered a species if their genotypes form a cluster, that is, if the genetic differences within the population are small compared to the differences between populations.

Before we can model new species, we need the ability to identify clusters of agents in the landscape, which means we need a definition of distance between locations. Since locations are represented with arrays of bits, we’ll define distance as the number of bits that differ between locations. FitnessLandscape provides a distance method:

# class FitnessLandscape def distance(self, loc1, loc2): return np.sum(np.logical_xor(loc1, loc2))

Figure 11.5: Mean distance between agents over time.

The logical_xor function computes “exclusive OR", which is True for bits that differ, and False for the bits that are the same.

To quantify the dispersion of a population, we can compute the mean of the distances between pairs of agents. In the notebook for this chapter, you’ll see the MeanDistance instrument, which computes this metric after each time step.

Figure ?? shows mean distance between agents over time. Because we start with identical mutants, the initial distances are 0. As mutations occur, mean distance increases, reaching a maximum while the population migrates across the landscape.

Once the agents discover the optimal location, mean distance decreases until the population reaches an equilibrium where increasing distance due to mutation is balanced by decreasing distance as agents far from the optimal location disappear. In these simulations, the mean distance in equilibrium is near 1; that is, most agents are only one mutation away from optimal.

Now we are ready to look for new species. To model a simple kind of speciation, suppose a population evolves in an unchanging environment until it reaches steady state (like some species we find in nature that seem to have changed very little over long periods of time).

Now suppose we either change the environment or transport the population to a new environment. Some features that increased fitness in the old environment might decrease it in the new environment, and vice versa.

We can model these scenarios by running a simulation until the population reaches steady state, then changing the fitness landscape, and then resuming the simulation until the population reaches steady state again.

Figure 11.6: Mean fitness over time. After 500 time steps, we change the fitness landscape.

Figure ?? shows results from a simulation like that. We start with 100 identical mutants at a random location, and run the simulation for 500 time steps. At that point, many agents are at the optimal location, which has fitness near 0.65 in this example. The genotypes of the agents form a cluster, with the mean distance between agents near 1.

After 500 steps, we run FitnessLandscape.set_values, which changes the fitness landscape; then we resume the simulation. Mean fitness is lower, as we expect because the optimal location in the old landscape is no better than a random location in the new landscape.

After the change, mean fitness increases again as the population migrates across the new landscape, eventually finding the new optimum, which has fitness near 0.75 (which happens to be higher in this example, but needn’t be).

Once the population reaches steady state, it forms a new cluster, with mean distance between agents near 1 again.

Now if we compute the distance between the agents’ locations before and after the change, they differ by more than 6, on average. The distances between clusters are much bigger than the distances between agents in each cluster, so we can interpret these clusters as distinct species.

11.10 Summary

We have seen that mutation, along with differential survival and reproduction, is sufficient to cause increasing fitness, increasing diversity, and a simple form of speciation. This model is not meant to be realistic; evolution in natural systems is much more complicated than this. Rather, it is meant to be a “sufficiency theorem"; that is, a demonstration that the features of the model are sufficient to produce the behavior we are trying to explain (see http://thinkcomplex.com/suff).

Logically, this “theorem" doesn’t prove that evolution in nature is caused by these mechanisms alone. But since these mechanisms do appear, in many forms, in biological systems, it is reasonable to think that they at least contribute to natural evolution.

Likewise, the model does not prove that these mechanisms always cause evolution. But the results we see here turn out to be robust: in almost any model that includes these features — imperfect replicators, variability, and differential reproduction — evolution happens.

I hope this observation helps to demystify evolution. When we look at natural systems, evolution seems complicated. And because we primarily see the results of evolution, with only glimpses of the process, it can be hard to imagine and hard to believe.

But in simulation, we see the whole process, not just the results. And by including the minimal set of features to produce evolution — temporarily ignoring the vast complexity of biological life — we can see evolution as the surprisingly simple, inevitable idea that it is.

11.11 Exercises

The code for this chapter is in the Jupyter notebook chap11.ipynb in the repository for this book. Open the notebook, read the code, and run the cells. You can use the notebook to work on the following exercises. My solutions are in chap11soln.ipynb.

Exercise 1

The notebook shows the effects of differential reproductions and survival separately. What if you have both? Write a class called SimWithBoth that uses the version of choose_dead from SimWithDiffSurvival and the version of choose_replacements from SimWithDiffReproduction. Does mean fitness increase more quickly?

As a Python challenge, can you write this class without copying code?

Exercise 2

When we change the landscape as in Section ??, the number of occupied locations and the mean distance usually increase, but the effect is not always big enough to be obvious. Try out some different random seeds to see how general the effect is.

1: This model is a variant of the NK model developed primarily by Stuart Kauffman (see http://thinkcomplex.com/nk).

Buy this book at Amazon.com

Contribute

If you would like to make a contribution to support my books, you can use the button below and pay with PayPal. Thank you!

Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.