Previous Up Next

This HTML version of Think Complexity, 2nd Edition is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.

You might prefer to read the PDF version.

Chapter 11  Evolution

The most important idea in biology, and possibly all of science, is the theory of evolution by natural selection, which claims that new species are created, and existing species change over time, due to natural selection. Natural selection is a process in which heritable differences between individuals cause differences in survival or reproduction (or both).

Among people who know something about biology, the theory of evolution is widely regarded as a fact, which is to say that it close enough to the truth that if it is corrected in the future, the amendments will leave the central ideas substantially intact.

Nevertheless, many people do not believe in evolution. In a survey run by the Pew Research Center, survey respondents were asked which of the following claims is closer to their view:

  1. Humans and other living things have evolved over time.
  2. Humans and other living things have existed in their present form since the beginning of time.

About 34% of Americans chose the second (see http://www.thearda.com/Archive/Files/Codebooks/RELLAND14_CB.asp).

Even among the ones who believe that living things have evolved, barely more than half believe that the cause of evolution is natural selection. In other words, only about 33% of Americans believe that the theory of evolution is true.

How is this possible? In my opinion, contributing factors include:

  • Some people think that there is a conflict between evolution and their religious beliefs and, feeling like they have to reject one, they reject evolution.
  • Others have been actively misinformed, often by members of the first group, so that much of what they know about evolution is false.
  • And many people simply don’t know anything about evolution.

There’s probably not much I can do about the first group, but I think I can help the others. Empirically, the theory of evolution is hard for people to understand. At the same time, it is profoundly simple: for many people, once they understand it, it seems both obvious and irrefutable.

To help people make this transition from confusion to clarity, the most powerful tool I have found is computation. Ideas that are hard to understand in theory can be easy to understand when we see them happening in simulation. That is the goal of this chapter.

11.1  Simulating evolution

I will start with a simple model that demonstrates a basic form of evolution. According to the theory, the following features are sufficient to produce evolution:

  • Replicators: We need a population of agents that can reproduce in some way. We’ll start with replicators that make perfect copies of themselves. Later we’ll add imperfect copying, that is, mutation.
  • Variation: We also need some variability in the population, that is, differences between individuals.
  • Differential survival or reproduction: The differences between individuals have to affect their ability to survive or reproduce.

To simulate these features, we’ll define a population of agents that represent individual organisms. Each agent has genetic information, called its genotype, which is the information that gets copied when the agent replicates. In our model, a genotype is represented by a sequence of N binary digits (zeros and ones), where N is a parameter we can choose.

To generate a population with variation, we choose initial genotypes at random; later we will explore mechanisms that generate or increase variation.

Finally, to generate differential survival and reproduction, we define a function that maps from each genotype to a fitness, where fitness is a quantity related to the ability of an agent to survive or reproduce.

11.2  Fitness landscape

The function that maps from genotype to fitness is called a fitness landscape. In the landscape metaphor, each genotype corresponds to a location in an N-dimensional space, and fitness corresponds to the “height" of the landscape at that location. For visualizations that might clarify this metaphor, see https://en.wikipedia.org/wiki/Fitness_landscape.

In biological terms, the fitness landscape represents information about how the genotype of an organism is related to its physical form and capabilities, called its phenotype, and how the phenotype interacts with its environment.

In the real world, fitness landscapes are complicated, but we don’t need to build a realistic model. To induce evolution, we need some relationship between genotype and fitness, but it turns out that it can be any relationship. To demonstrate this point, we’ll generate the fitness function at random.

Here is the definition for a class that represents a fitness landscape:

class FitnessLandscape: def __init__(self, N): self.N = N self.one_values = np.random.random(N) self.zero_values = np.random.random(N) def fitness(self, loc): fs = np.where(loc, self.one_values, self.zero_values) return fs.mean()

The genotype of an agent, which corresponds to its location in the fitness landscape, is represented by a NumPy array of zeros and ones, called loc. The fitness of a given genotype is the mean of N fitness contributions, one for each element of loc.

To compute the fitness of a genotype, FitnessLandscape uses two arrays: one_values, which contains the fitness contributions of having a 1 in each element of loc, and zero_values, which contains the fitness contributions of having a 0.

The fitness method uses np.where to select a value from one_values where loc has a 1, and a value from zero_values where loc has a 0.

As an example, suppose N=3 and

one_values = [0.1, 0.2, 0.3] zero_values = [0.4, 0.7, 0.9]

In that case, the fitness of loc = [0, 1, 0] would be the mean of [0.4, 0.2, 0.9], which is 0.5.

11.3  Agents

Next we need agents. Here’s the class definition:

class Agent: def __init__(self, loc, fit_land): self.loc = loc self.fit_land = fit_land self.fitness = fit_land.fitness(self.loc) def copy(self): return Agent(self.loc, self.fit_land)

The attributes of an Agent are:

  • loc: The location of the Agent in the fitness landscape.
  • fit_land: A reference to a FitnessLandscape object.
  • fitness: The fitness of this Agent in the FitnessLandscape, represented as a number between 0 and 1.

This definition of Agent provides a simple copy method that copies the genotype exactly; later, we will see a version that copies with mutation, but mutation is not necessary for evolution.

11.4  Simulation

Now that we have agents and a fitness landscape, I’ll define a class called Simulation that simulates the creation, reproduction, and death of the agents. To avoid getting bogged down, I’ll present a simplified version of the code here; you can see the details in the notebook for this chapter.

Here’s the definition of Simulation:

class Simulation: def __init__(self, fit_land, agents): self.fit_land = fit_land self.agents = agents

The attributes of a Simulation are:

  • fit_land: A reference to a FitnessLandscape object.
  • agents: An array of Agent objects.

The most important function in Simulation is step, which simulates one time step:

# class Simulation: def step(self): n = len(self.agents) fits = self.get_fitnesses() # see who dies index_dead = self.choose_dead(fits) num_dead = len(index_dead) # replace the dead with copies of the living replacements = self.choose_replacements(num_dead, fits) self.agents[index_dead] = replacements

step uses three other methods:

  • get_fitnesses returns an array containing the fitness of each agent in the order they appear in the agent array.
  • choose_dead decides which agents die during this timestep, and returns an array that contains the indices of the dead agents.
  • choose_replacements decides which agents reproduce during this timestep, invokes copy on each one, and returns an array of new Agent objects.

In this version of the simulation, the number of new agents during each timestep equals the number of dead agents, so the number of live agents is constant.

11.5  No differentiation

Before we run the simulation, we have to specify the behavior of choose_dead and choose_replacements. We’ll start with simple versions of these functions that don’t depend on fitness:

# class Simulation def choose_dead(self, fits): n = len(self.agents) is_dead = np.random.random(n) < 0.1 index_dead = np.nonzero(is_dead)[0] return index_dead def choose_replacements(self, n, fits): agents = np.random.choice(self.agents, size=n, replace=True) replacements = [agent.copy() for agent in agents] return replacements

In choose_dead, n is the number of agents; is_dead is a boolean array that contains True for the agents who die during this time step. In this version, every agents has the same chance of dying, 10%.

choose_dead uses np.nonzero to find the indices of the non-zero elements of is_dead (True is considered non-zero).

In choose_replacements, n is the number of agents who reproduce during this time step. It uses np.random.choice to choose n agents, with replacement. Then it invokes copy on each one and returns a list of new Agent objects.

These methods don’t depend on fitness, so this simulation does not have differential survival or reproduction. As a result, we should not expect to see evolution. But how can we tell?

11.6  Evidence of evolution

The most inclusive definition of evolution is a change in the distribution of genotypes in a population. So evolution is an aggregate effect: in other words, individuals don’t evolve; populations do.

In this simulation, genotypes are locations in a high-dimensional space, so it is hard to visualize changes in their distribution. However, if the genotypes change, we expect their fitness to change as well. So we will use changes in the distribution of fitness as evidence of evolution. In particular, we’ll look at the mean and standard deviation of fitness across the population.

Before we run the simulation, we have to add an Instrument, which is an object that gets updated after each time step, computes some statistic of interest, and stores the result in a sequence we can plot later.

Here is the parent class for all instruments:

class Instrument: def __init__(self): self.metrics = []

And here’s the definition for MeanFitness, an instrument that computes the mean fitness of the population at each time step:

class MeanFitness(Instrument): def update(self, sim): mean = np.mean(sim.get_fitnesses()) self.metrics.append(mean)

Now we’re ready to run the simulation. To minimize the effect of random changes in the starting population, we’ll start every simulation with the same set of agents. And to make sure we explore the entire fitness landscape, we’ll start with one agent at every location. Here’s the code that creates the Simulation:

N = 8 fit_land = FitnessLandscape(N) agents = make_all_agents(fit_land, Agent) sim = Simulation(fit_land, agents)

make_all_agents creates one Agent for every location; the implementation is in the notebook for this chapter.

Now we can create and add a MeanFitness instrument, run the simulation, and plot the results:

instrument = MeanFitness() sim.add_instrument(instrument) sim.run() sim.plot(0)

The Simulation keeps a list of Instrument objects. After each timestep it invokes update on each Instrument in the list.

After the simulation runs, we plot the results using Simulation.plot, which takes an index as a parameter, uses the index to select an Instrument from the list, and plots the results. In this example, there is only one Instrument, so the index is 0.


Figure 11.1: Mean fitness over time for 10 simulations with no differential survival or reproduction.

Figure ?? shows the results of running this simulation, with the MeanFitness instrument, 10 times.

The mean fitness of the population drifts up or down, due to chance. Since the distribution of fitness changes over time, we infer that the distribution of phenotypes is also changing. By the most inclusive definition, this random walk is a kind of evolution. But it is not a particularly interesting kind.

In particular, this kind of evolution does not explain how biological species change over time, or how new species appear. The theory of evolution is powerful because it explains phenomena we see in the natural world that seem inexplicable:

  • Adaptation: Species interact with their environments in ways that seem too complex, too intricate, and too clever to happen by chance. Many features of natural systems seem as if they were designed.
  • Increasing diversity: Over time the number of species on earth has generally increased (despite several periods of mass extinction).
  • Increasing complexity: The history of life on earth starts with relatively simple life forms, with more complex organisms appearing later in the geological record.

These are the phenomena we want to explain. So far, our model doesn’t do the job.

11.7  Differential survival

So let’s add one more ingredient, differential survival. Here’s the definition for a class that extends Simulation and overrides choose_dead:

class SimWithDiffSurvival(Simulation): def choose_dead(self, fits): n = len(self.agents) is_dead = np.random.random(n) > fits index_dead = np.nonzero(is_dead)[0] return index_dead

Now the probability of survival depends on fitness; in fact, in this version, the probability that an agent survives each time step is its fitness.

Since agents with low fitness are more likely to die, agents with high fitness are more likely to survive long enough to reproduce. So we expect the number of low-fitness agents to decrease, and the number of high-fitness agents to increase. If we plot the mean fitness over time, we expect it to increase.


Figure 11.2: Mean fitness over time for 10 simulations with differential survival.

Figure ?? shows the results of 10 simulations with differential survival. Mean fitness increases quickly at first, but then levels off.

You can probably figure out why it levels off: if there is only one agent at a particular location and it dies, it leaves that location unoccupied. And without mutation, there is no way for it to be occupied again.

With N=8, this simulation starts with 256 agents occupying all possible locations. Over time, the number of unique locations decreases; if the simulation runs long enough, eventually all agents will occupy the same location.

So this simulation starts to explain adaptation: increasing fitness means that the species is getting better at surviving in its environment. But with a decreasing number of locations, it does not explain increasing diversity at all.

In the notebook for this chapter, you will see the effect of differential reproduction. As you might expect, differential reproduction also increases mean fitness. But without mutation, we still don’t see increasing diversity.

11.8  Mutation

In the simulations so far, we start with the maximum possible diversity — one agent at every location in the landscape — and end (eventually) with the minimum possible diversity, all agents at one location.

That’s almost the opposite of what happened in the natural world, which apparently began with a single species that branched, over time, into the millions, or possibly billions, of species on Earth today (see https://en.wikipedia.org/wiki/Global_biodiversity).

With perfect copying in our model, we never see increasing diversity. But if we add mutation, along with differential survival and reproduction, we get a step closer to understanding evolution in nature.

Here is a class definition that extends Agent and overrides copy:

class Mutant(Agent): prob_mutate = 0.05 def copy(self): if np.random.random() > self.prob_mutate: loc = self.loc.copy() else: direction = np.random.randint(self.fit_land.N) loc = self.mutate(direction) return Mutant(loc, self.fit_land)

In this model of mutation, every time we call copy, there is a 5% chance of mutation. In case of mutation, we choose a random direction from the current location — that is, a random bit in the genotype — and flip it. Here’s mutate:

def mutate(self, direction): new_loc = self.loc.copy() new_loc[direction] ^= 1 return new_loc

Now that we have mutation, we don’t have to start with an agent at every location. Instead, we can start with the minimum variability: all agents at the same location.


Figure 11.3: Mean fitness over time for 10 simulations with mutation and differential survival and reproduction.

Figure ?? shows the results of 10 simulations with mutation and differential survival and reproduction. In every case, the population evolves toward the location with maximum fitness.


Figure 11.4: Number of occupied locations over time for 10 simulations with mutation and differential survival and reproduction.

To measure diversity in the population, we can plot the number of occupied locations after each timestep. Figure ?? shows the results. We start with 100 agents, all at the same location. As mutations occur, the number of occupied locations increases quickly.

When an agent discovers a high-fitness location, it is more likely to survive and reproduce. Agents at lower-fitness locations eventually die out. Over time, the population migrates through the landscape until most agents are at the location with the highest fitness.

At that point, the system reaches an equilibrium where mutation occupies new locations at the same rate that differential survival causes lower-fitness locations to be left empty.

The number of occupied locations in equilibrium depends on the mutation rate and the degree of differential survival. In these simulations the number of unique occupied locations at any point is typically 5–15.

It is important to remember that the agents in this model don’t move, just as the genotype of an organism doesn’t change. When an agent dies, it can leave a location unoccupied. And when a mutation occurs, it can occupy a new location.

As agents disappear from some locations and appear in others, the population migrates across the landscape, like a glider in Game of Life. But organisms don’t evolve; populations do.

11.9  Speciation

The theory of evolution says that natural selection changes existing species and creates new ones. In our model, we have seen changes, but we have not seen a new species. It’s not even clear, in the model, what a new species would look like.

Among species that reproduce sexually, two organisms are considered the same species if they can breed and produce fertile offspring. But the agents in the model don’t reproduce sexually, so this definition doesn’t apply.

Among organisms that reproduce asexually, like bacteria, the definition of species is not as clear-cut. Generally, a population is considered a species if their genotypes form a cluster, that is, if the genetic differences within the population are small compared to the differences between populations.


Figure 11.5: Mean distance between agents over time.

So before we can model new species, we need the ability to identify clusters of agents in the landscape, which means we need a definition of distance between locations. Since locations are represented with strings of binary digits, we’ll define distance as the number of bit in the genotype that differ. FitnessLandscape provides a distance method:

# class FitnessLandscape def distance(self, loc1, loc2): return np.sum(np.logical_xor(loc1, loc2))

The logical_xor function is True for the elements of the locations that differ, and False for the elements that are the same.

To quantify the dispersion of a population, we can compute the mean of the distances between every pair of agents. In the notebook for this chapter, you’ll see the MeanDistance instrument, which computes this metric after each time step.

Figure ?? shows mean distance between agents over time. Because we start with identical mutants, the initial distances are 0. As mutations occur, mean distance increases, reaching a maximum while the population migrates across the landscape.

Once the agents discover the optimal location, mean distance decreases until the population reaches an equilibrium where increasing distance due to mutation is balanced by decreasing distance as agents far from the optimal location are more likely to die. In these simulations, the mean distance in equilibrium is near 1.5; that is, most agents are only 1–2 mutations away from optimal.

Now we are ready to look for new species. To model a simple kind of speciation, suppose a population evolves in an unchanging environment until it reaches steady state (like some species we find in nature that seem to have changed very little over long periods of time).

Now suppose we either change the environment or transport the population to a new environment. Some features that increased fitness in the old environment might decrease it in the new environment, and vice versa.

We can model these scenarios by running a simulation until the population reaches steady state, then changing the fitness landscape, and then resuming the simulation until the population reaches steady state again.


Figure 11.6: Mean fitness over time. After 500 timesteps, we change the fitness landscape.

Figure ?? shows results from a simulation like that. Again, we start with 100 identical mutants at a random location, and run the simulation for 500 timesteps. At that point, many agents are at the optimal location, which has fitness near 0.65, in this example. And the genotypes of the agents form a cluster, with the mean distance between agents near 1.

After 500 steps, we run FitnessLandscape.set_values, which changes the mapping from genotype to fitness; then we resume the simulation. Mean fitness drops immediately, because the optimal location and its neighbors in the old landscape are no better than random locations in the new landscape.

However, mean fitness increases quickly as the population migrates across the new landscape, eventually finding the new optimal location, which has fitness near 0.75 (which happens to be higher in this example, but needn’t be).

Once the population reaches steady state, it forms a new cluster, with mean distance between agents near 1 again.

Now if we compute the distance between the agents’ locations before and after the change, they differ by more than 6, on average. The distances between clusters are much bigger than the distances between agents in each cluster, so we can interpret these clusters as distinct species.

11.10  Summary

We have seen that mutation, along with differential survival and reproduction, is sufficient to cause increasing fitness, increasing diversity, and a simple form of speciation. This model is not meant to be realistic; evolution in natural systems is much more complicated than this. Rather, it is meant to be a “sufficiency theorem"; that is, a demonstration that the features of the model are sufficient to produce the behavior we are trying to explain (see https://en.wikipedia.org/wiki/Necessity_and_sufficiency).

Logically, this “theorem" doesn’t prove that evolution in nature is caused by these mechanisms alone. But since these mechanisms do appear, in many forms, in biological systems, it is reasonable to think that they at least contribute to natural evolution.

Likewise, the model does not prove that these mechanisms always cause evolution. But the results we see here turns out to be robust: in almost any model that includes these features — imperfect replicators, variability, and differential reproduction — evolution happens.

I hope this observation helps to demystify evolution. When we look at natural systems, evolution seems complicated. And because we primarily see the results of evolution, with only glimpses of the process, it can be hard to imagine and sometimes hard to believe.

But in simulation, we can see the whole process, not just the results. And by including only the minimal set of features to produce evolution — temporarily ignoring the vast complexity of biological life — we can see evolution as the surprisingly simple, inevitable idea that it is.

Are you using one of our books in a class?

We'd like to know about it. Please consider filling out this short survey.


Think DSP

Think Java

Think Bayes

Think Python 2e

Think Stats 2e

Think Complexity


Previous Up Next