This HTML version of Think Complexity, 2nd Edition is provided for convenience, but it is not the best format of the book. In particular, some of the symbols are not rendered correctly.
You might prefer to read the PDF version.
Chapter 11 Evolution
The most important idea in biology, and possibly all of science, is the theory of evolution by natural selection, which claims that new species are created and existing species change due to natural selection. Natural selection is a process in which inherited variations between individuals cause differences in survival and reproduction.
Among people who know something about biology, the theory of evolution is widely regarded as a fact, which is to say that it is consistent with all current observations; it is highly unlikely to be contradicted by future observations; and, if it is revised in the future, the changes will almost certainly leave the central ideas substantially intact.
Nevertheless, many people do not believe in evolution. In a survey run by the Pew Research Center, survey respondents were asked which of the following claims is closer to their view:
About 34% of Americans chose the second (see https://thinkcomplex.com/arda).
Even among the ones who believe that living things have evolved, barely more than half believe that the cause of evolution is natural selection. In other words, only a third of Americans believe that the theory of evolution is true.
How is this possible? In my opinion, contributing factors include:
There’s probably not much I can do about the first group, but I think I can help the others. Empirically, the theory of evolution is hard for people to understand. At the same time, it is profoundly simple: for many people, once they understand it, it seems both obvious and irrefutable.
To help people make this transition from confusion to clarity, the most powerful tool I have found is computation. Ideas that are hard to understand in theory can be easy to understand when we see them happening in simulation. That is the goal of this chapter.
The code for this chapter is in
11.1 Simulating evolution
I start with a simple model that demonstrates a basic form of evolution. According to the theory, the following features are sufficient to produce evolution:
To simulate these features, we’ll define a population of agents
that represent individual organisms. Each agent has genetic
information, called its genotype, which is the information that
gets copied when the agent replicates. In our model1, a
genotype is represented by a sequence of
To generate variation, we create a population with a variety of genotypes; later we will explore mechanisms that create or increase variation.
Finally, to generate differential survival and reproduction, we define a function that maps from each genotype to a fitness, where fitness is a quantity related to the ability of an agent to survive or reproduce.
11.2 Fitness landscape
The function that maps from genotype to fitness is called a fitness landscape. In the landscape metaphor, each genotype corresponds to a location in an
In biological terms, the fitness landscape represents information about how the genotype of an organism is related to its physical form and capabilities, called its phenotype, and how the phenotype interacts with its environment.
In the real world, fitness landscapes are complicated, but we don’t need to build a realistic model. To induce evolution, we need some relationship between genotype and fitness, but it turns out that it can be any relationship. To demonstrate this point, we’ll use a totally random fitness landscape.
Here is the definition for a class that represents a fitness landscape:
The genotype of an agent, which corresponds to its location in the fitness landscape, is represented by a NumPy array of zeros and ones called
To compute the fitness of a genotype,
As an example, suppose
In that case, the fitness of
Next we need agents. Here’s the class definition:
The attributes of an
Now that we have agents and a fitness landscape, I’ll define a class called
Here’s the definition of
The attributes of a
The most important function in
In this version of the simulation, the number of new agents during each time step equals the number of dead agents, so the number of live agents is constant.
11.5 No differentiation
Before we run the simulation, we have to specify the behavior of
These methods don’t depend on fitness, so this simulation does not have differential survival or reproduction. As a result, we should not expect to see evolution. But how can we tell?
11.6 Evidence of evolution
The most inclusive definition of evolution is a change in the distribution of genotypes in a population. Evolution is an aggregate effect: in other words, individuals don’t evolve; populations do.
In this simulation, genotypes are locations in a high-dimensional space, so it is hard to visualize changes in their distribution. However, if the genotypes change, we expect their fitness to change as well. So we will use changes in the distribution of fitness as evidence of evolution. In particular, we’ll look at the mean and standard deviation of fitness over time.
Before we run the simulation, we have to add an
Here is the parent class for all instruments:
And here’s the definition for
Now we’re ready to run the simulation. To avoid the effect of random changes in the starting population, we start every simulation with the same set of agents. And to make sure we explore the entire fitness landscape, we start with one agent at every location. Here’s the code that creates the
Now we can create and add a
Figure ?? shows the result of running this simulation 10 times. The mean fitness of the population drifts up or down at random. Since the distribution of fitness changes over time, we infer that the distribution of phenotypes is also changing. By the most inclusive definition, this random walk is a kind of evolution. But it is not a particularly interesting kind.
In particular, this kind of evolution does not explain how biological species change over time, or how new species appear. The theory of evolution is powerful because it explains phenomena we see in the natural world that seem inexplicable:
These are the phenomena we want to explain. So far, our model doesn’t do the job.
11.7 Differential survival
Let’s add one more ingredient, differential survival. Here’s a class that extends
Now the probability of survival depends on fitness; in fact, in this version, the probability that an agent survives each time step is its fitness.
Since agents with low fitness are more likely to die, agents with high fitness are more likely to survive long enough to reproduce. Over time we expect the number of low-fitness agents to decrease, and the number of high-fitness agents to increase.
Figure ?? shows mean fitness over time for 10 simulations with differential survival. Mean fitness increases quickly at first, but then levels off.
You can probably figure out why it levels off: if there is only one agent at a particular location and it dies, it leaves that location unoccupied. Without mutation, there is no way for it to be occupied again.
So this simulation starts to explain adaptation: increasing fitness means that the species is getting better at surviving in its environment. But the number of occupied locations decreases over time, so this model does not explain increasing diversity at all.
In the notebook for this chapter, you will see the effect of differential reproduction. As you might expect, differential reproduction also increases mean fitness. But without mutation, we still don’t see increasing diversity.
In the simulations so far, we start with the maximum possible diversity — one agent at every location in the landscape — and end with the minimum possible diversity, all agents at one location.
That’s almost the opposite of what happened in the natural world, which apparently began with a single species that branched, over time, into the millions, or possibly billions, of species on Earth today (see https://thinkcomplex.com/bio).
With perfect copying in our model, we never see increasing diversity. But if we add mutation, along with differential survival and reproduction, we get a step closer to understanding evolution in nature.
Here is a class definition that extends
In this model of mutation, every time we call
Now that we have mutation, we don’t have to start with an agent at every location. Instead, we can start with the minimum variability: all agents at the same location.
Figure ?? shows the results of 10 simulations with mutation and differential survival and reproduction. In every case, the population evolves toward the location with maximum fitness.
To measure diversity in the population, we can plot the number of occupied locations after each time step. Figure ?? shows the results. We start with 100 agents at the same location. As mutations occur, the number of occupied locations increases quickly.
When an agent discovers a high-fitness location, it is more likely to survive and reproduce. Agents at lower-fitness locations eventually die out. Over time, the population migrates through the landscape until most agents are at the location with the highest fitness.
At that point, the system reaches an equilibrium where mutation occupies new locations at the same rate that differential survival causes lower-fitness locations to be left empty.
The number of occupied locations in equilibrium depends on the mutation rate and the degree of differential survival. In these simulations the number of unique occupied locations at any point is typically 5–15.
It is important to remember that the agents in this model don’t move, just as the genotype of an organism doesn’t change. When an agent dies, it can leave a location unoccupied. And when a mutation occurs, it can occupy a new location. As agents disappear from some locations and appear in others, the population migrates across the landscape, like a glider in Game of Life. But organisms don’t evolve; populations do.
The theory of evolution says that natural selection changes existing species and creates new ones. In our model, we have seen changes, but we have not seen a new species. It’s not even clear, in the model, what a new species would look like.
Among species that reproduce sexually, two organisms are considered the same species if they can breed and produce fertile offspring. But the agents in the model don’t reproduce sexually, so this definition doesn’t apply.
Among organisms that reproduce asexually, like bacteria, the definition of species is not as clear-cut. Generally, a population is considered a species if their genotypes form a cluster, that is, if the genetic differences within the population are small compared to the differences between populations.
Before we can model new species, we need the ability to identify clusters of agents in the landscape, which means we need a definition of distance between locations. Since locations are represented with arrays of bits, we’ll define distance as the number of bits that differ between locations.
To quantify the dispersion of a population, we can compute the mean of the distances between pairs of agents. In the notebook for this chapter, you’ll see the
Figure ?? shows mean distance between agents over time. Because we start with identical mutants, the initial distances are 0. As mutations occur, mean distance increases, reaching a maximum while the population migrates across the landscape.
Once the agents discover the optimal location, mean distance decreases until the population reaches an equilibrium where increasing distance due to mutation is balanced by decreasing distance as agents far from the optimal location disappear. In these simulations, the mean distance in equilibrium is near 1; that is, most agents are only one mutation away from optimal.
Now we are ready to look for new species. To model a simple kind of speciation, suppose a population evolves in an unchanging environment until it reaches steady state (like some species we find in nature that seem to have changed very little over long periods of time).
Now suppose we either change the environment or transport the population to a new environment. Some features that increased fitness in the old environment might decrease it in the new environment, and vice versa.
We can model these scenarios by running a simulation until the population reaches steady state, then changing the fitness landscape, and then resuming the simulation until the population reaches steady state again.
Figure ?? shows results from a simulation like that. We start with 100 identical mutants at a random location, and run the simulation for 500 time steps. At that point, many agents are at the optimal location, which has fitness near 0.65 in this example. The genotypes of the agents form a cluster, with the mean distance between agents near 1.
After 500 steps, we run
After the change, mean fitness increases again as the population migrates across the new landscape, eventually finding the new optimum, which has fitness near 0.75 (which happens to be higher in this example, but needn’t be).
Once the population reaches steady state, it forms a new cluster, with mean distance between agents near 1 again.
Now if we compute the distance between the agents’ locations before and after the change, they differ by more than 6, on average. The distances between clusters are much bigger than the distances between agents in each cluster, so we can interpret these clusters as distinct species.
We have seen that mutation, along with differential survival and reproduction, is sufficient to cause increasing fitness, increasing diversity, and a simple form of speciation. This model is not meant to be realistic; evolution in natural systems is much more complicated than this. Rather, it is meant to be a “sufficiency theorem"; that is, a demonstration that the features of the model are sufficient to produce the behavior we are trying to explain (see https://thinkcomplex.com/suff).
Logically, this “theorem" doesn’t prove that evolution in nature is caused by these mechanisms alone. But since these mechanisms do appear, in many forms, in biological systems, it is reasonable to think that they at least contribute to natural evolution.
Likewise, the model does not prove that these mechanisms always cause evolution. But the results we see here turn out to be robust: in almost any model that includes these features — imperfect replicators, variability, and differential reproduction — evolution happens.
I hope this observation helps to demystify evolution. When we look at natural systems, evolution seems complicated. And because we primarily see the results of evolution, with only glimpses of the process, it can be hard to imagine and hard to believe.
But in simulation, we see the whole process, not just the results. And by including the minimal set of features to produce evolution — temporarily ignoring the vast complexity of biological life — we can see evolution as the surprisingly simple, inevitable idea that it is.
The code for this chapter is in the Jupyter notebook chap11.ipynb in the repository for this book. Open the notebook, read the code, and run the cells. You can use the notebook to work on the following exercises. My solutions are in chap11soln.ipynb.
The notebook shows the effects of differential reproductions and survival separately. What if you have both? Write a class called
As a Python challenge, can you write this class without copying code?
When we change the landscape as in Section ??, the number of occupied locations and the mean distance usually increase, but the effect is not always big enough to be obvious. Try out some different random seeds to see how general the effect is.
Are you using one of our books in a class?We'd like to know about it. Please consider filling out this short survey.