When a process is thought to be too slow or impractical to test experimentally, simulation science is a valuable tool for testing its validity.
The evolution of life is essentially such a process: too slow for conclusive direct observation and too vast in time for a practical empirical approach. Can evolution be simulated? If so, with what results?
Computer simulations of processes that are too expensive or difficult to reproduce in reality are now commonplace in science and engineering. In the virtual universe of computers, the dynamic characteristics of an aeroplane are known long before the first physical prototype exists, the resistance of buildings to earthquakes can be determined without endangering life or property, and even atomic bombs can be detonated at will by the major powers, free from the global moratorium that accompanies real nuclear testing.
With the exponential increase in available computing power, simulations have gained immense modelling and predictive power. As such, the question of whether evolutionary processes can also be simulated becomes legitimate.
Today’s supercomputers should be able to compress millions of years of evolution into days or even hours, at least for models of simple, unicellular, asexually reproducing organisms such as bacteria.
If the evolution of life is a real phenomenon, occurring over vast periods of time through selective pressure on spontaneous mutations at the level of the genetic code, then the general features of this process should be fairly easy to simulate. Indeed, the basic principles of evolution are easy to express: random mutations and selective pressures on a self-replicating population of genetic code, i.e. a population of organisms, from a computer scientist’s point of view.
The prospect of using simulations to provide important confirmation of evolutionary theory has already attracted a significant number of researchers, and the field is booming. New results that seem to confirm and enrich the theory continue to emerge. Below we outline some guidelines for understanding and interpreting these results.
Teleological gradualism
In The Blind Watchmaker, evolutionary biologist and militant atheist Richard Dawkins was one of the first to propose some simple “games” of evolutionary simulations. First, Dawkins presents a computer-implemented word game in which you start with a random string of letters and trace the “evolution” of successive generations of that string to a target sentence from Shakespeare: “Methinks it is like a weasel.”
Within each “generation” of the string, an arbitrary number of successor strings are produced that are almost identical to the reference string, except for small random changes designed to simulate mutation. From these, the computer automatically selects the string that most closely resembles the Shakespearean phrase to become the next generation of the string. By simple arithmetic, a computer program searching for “Methinks it is like a weasel” using random combinations of letters would have a 1 in 1040 chance of success. This is such a tiny number that billions of years would probably not be enough, no matter how powerful the computer.
But after how many generations of sentences does the evolutionary strategy of Dawkins’ program manage to generate “Methinks it is like a weasel” from a bunch of meaningless letters? The answer is both surprising and remarkable: it usually takes only 40-60 generations!
A second example Dawkins gives is a program that simulates the evolution of shapes he calls “biomorphs.” A biomorph is a geometric figure made up of simple segments and governed by well-defined parameters.
As in the previous case, a biomorph of one generation will give rise to several child biomorphs, randomly varying the control parameters a little at a time, simulating random mutations. However, unlike the previous example, it is not a computer that selects the biomorph that will go on to the next generation, but the human user who plays this role.
In this way, the experiment is guided at every step by a human’s “artificial selection.” Using this process, Dawkins was able to produce biomorphs that vaguely resembled trees, frogs or insects. And again, the number of generations required was surprisingly small.
These results seem to provide impressive confirmation of the power of selective pressure to overcome seemingly insurmountable odds and direct a random process towards a predetermined goal. On closer inspection, however, it becomes clear that in practice these simple simulations only show that random processes can be constrained by powerful and “intelligent” selective pressures to reach a predetermined goal. However, how “natural” these constraints are is another matter. In fact, the small degree of mutation allowed and the improbably high selective pressure are as unnatural as possible from an evolutionary perspective and are the key to this apparent and easy, but very deceptive, success. Moreover, natural evolution postulates and insists that it has no predetermined goal to work towards, so these constraints are doubly unnatural. These are two major objections to the validity of such simulations:
1) They are teleological, meaning that they have a predetermined goal, a characteristic that natural evolution cannot have. Moreover, this aspect has a subtle but decisive influence on the short time in which the goal is reached by selective pressures: a process without a precise goal would be much more likely to search “blindly” among uninteresting forms than a process constrained by a precise goal.
2) The selective pressures and mutation rates used are completely implausible compared to any conceivable natural scenario. In fact, it is quite easy to get not only a biomorph with a faint resemblance to an insect, but even a fairly detailed elephant, if random factors and “natural” selection are artificially and a priori constrained along a narrow and gradual path from insect to elephant.
Although the evolutionary scientific community does not generally denounce these Dawkins simulations as flawed or unrepresentative, they are nevertheless seen as more didactic and illustrative of “what evolution might do.” They are by no means considered serious simulations. However, there are simulations that aspire to that title, and we will discuss them below.
Genetic algorithms
One of the earliest applications of evolutionary principles in computer science was a more disguised type of simulation approach, generically called “genetic algorithms.” For a given problem, the strategy of this type of algorithm is to start with an initial “population” of sub-optimal solutions, perhaps even randomly selected, and to evolve this population towards the highest possible optimum, i.e. the highest degree of adaptation.
At some point, the solutions in the population may mutate, reassemble or “die,” all under the control of simulated natural selection, depending on the degree of adaptation calculated for each. These approaches generated a great deal of excitement in the 1990s, when the field was at its height. With the explosion of computer science, more and more of the real-world problems that computers were being asked to solve were found to be unsolvable by classical, analytical methods. Increasingly sophisticated but approximate, not necessarily optimal, solutions were sought that could be obtained by a computer in a reasonable amount of time.
Genetic algorithms held great promise: if nature and evolution had been able to find so many effective solutions to the extremely difficult problems of life’s functional complexity, why should the same evolutionary principles not be applied to the search for solutions to real and much simpler computational problems? All the more so because a computer can be as fast as it wants to be and is not a tributary of millions and millions of years of slow biological processes.
But where have these hopes led us today, decades later? Although both computer science and the computing power of computers have advanced enormously since then, genetic algorithms have remained a marginal and insignificant part of the landscape of modern computing. Their promise has not been fulfilled, and this is due to an inescapable dynamic of extreme generality: for a given problem, genetic algorithms were proposed to bypass the search for classical and difficult or impossible analytical solutions.
However, the more unconstrained populations of solutions were allowed to evolve, the more they tended not to converge to an interesting optimum, but to vary endlessly at random. On the other hand, the more constrained the mutation parameters, the reassemblies, and especially the function for evaluating the success of a solution, the more “useful” the process of population evolution became. Nevertheless, unfortunately, effectively constraining these parameters only meant finding more and more detailed ways to solve the problem in an approximate way, so that the evolutionary approach lost its raison d’être. As we have seen in the cases of Dawkins’ simulations, it is relatively easy to evolve interesting solutions, but only if you first strongly constrain the space in which mutation and natural selection can play out.
Digital life
But perhaps we are asking too much of evolutionary principles to do for computing even a fraction of what they are said to have done for biological life. Perhaps, for some as yet unknown reason, evolution doesn’t apply very well to computational problems and genetic algorithms. However, the question is whether evolutionary simulations can be used to observe at least some general aspects, such as the complexity of information in incremental steps. This should be observable because, as the renowned atheistic evolutionary biologist Daniel Dennett points out, “evolution will occur whenever and wherever three conditions are met: replication, variation (mutation), and differential fitness (competition).”[1]
And to keep things simple, we will not go into the more difficult issues of simulations, such as population genetics or simulations of the evolution of different organs, biological systems or even ecosystems. We will not even address the problem of simulating the evolution of multicellular life or sexually reproducing organisms. We will be concerned only with the simplest conceivable evolvable information units, i.e. small pieces of computer programs consisting of well-defined instructions that can multiply by simple copying, mutate and perform functions.
This is the field now informally called “digital life” or “artificial life.” There is already a whole ecosystem of software platforms designed to support experiments in the evolution of these programs, called “artificial life entities.” We will focus only on the most widely used and developed of these software packages, Avida.
Avida
Originally created in 1993 by computer science professor Charles Ofria and his team at Michigan State University, Avida is a software platform dedicated to the simulation of digital organisms: those small programs made up of simple instructions that obey Dennett’s three rules.
An organism in Avida is a set of instructions executed sequentially and circularly, each of which is executed depending on the accumulation of sufficient units of energy, which translate into available processor (CPU) time. Programs in the population compete for this finite resource—processor time—which they gain in proportion to their length. CPU time can also be gained in abundance by performing complex operations (consisting of multiple prime instructions) with fixed, predefined rewards.
Programs don’t know how to perform such complex operations at first, but that’s the point, so that eventually programs change and evolve spontaneously under the selective pressure of processor access time. One of the instructions is multiplication, and this is done by simple duplication, a process in which a predefined proportion of the instructions are randomly mutated. In this way, an entire population evolves to grab as much processor time as possible and run its own programme, as complex as possible, for as long as possible. Avida is currently the most developed and widely used tool in digital life, and correctly implements the general principles of evolution while leaving enough options for extensive human calibration.
The complexification of information
The famous evolutionary biologist Richard Lenski, whose research into the long-term evolution of E. coli bacteria we featured in a previous article, has shown a particular interest in Avida simulations. In a paper published in Nature in 2003 with Ofria and collaborators,[2] he describes an experiment designed to understand whether Avida could provide examples of functional information complexification through evolution.
The ultimate goal of the experiment was to obtain, through evolution, a program that implements the “equals” operator, labelled “EQU,” between two strings of bytes. The EQU operator takes any two byte strings (consisting of 0 and 1) as input and returns “true” or “false” depending on whether the two strings are equal, byte for byte, or not. Although intuitively a trivial operation, logically it requires a minimum of five elementary byte string instructions.
EQU can also be obtained in a more complicated, but not simpler, way. Lenski and his colleagues started with a population of programs, each of which consisted of a random sequence of logical instructions of a total of 26 different types. They also defined a series of rewards in the form of processor time for nine complex intermediate operations that the programs could develop. The processor time offered as a reward increased with the complexity of the operation, culminating in EQU having the highest reward.
The researchers ran the simulation and waited to see what would happen. Would the EQU operator be the evolutionary discovery of artificial computing organisms or not? A few thousand generations later (which only took a matter of seconds), the answer was clear: EQU was found not only by the “avidians,” but by a large number of the simulated organisms themselves. Analysing their “evolutionary history,” it was further observed that, as predicted by evolutionary theory, simple functions with smaller rewards were discovered first, and then, in rather large leaps, complex functions were built from the preceding simple ones, culminating in EQU. Simple functions thus played a fundamental role in the discovery of sophisticated functions, and computer evolution used readily available structures and functions to create new structures and functions, just as biological evolution is thought to do. It was also observed that when the simulation was run without rewards for simple functions, complicated functions, in particular EQU, did not evolve either. The resulting EQU functions were generally implemented differently between “avidians,” thus also fulfilling the evolutionary expectation that there can be multiple transformation paths leading to the same place, and that it is not the paths that matter to get to the destination, but the selective pressure to make do with what is available at any given time.
Is this rigorous simulation finally conclusive evidence for the validation of the evolutionary model? So far we have seen how previous examples of simulated evolution have failed to convince because they were obviously teleological (had a predetermined goal) and highly constrained. We can see that even in this case the simulated evolution is teleological: the authors have chosen a rigid set of nine complex operations to be rewarded, one of which is also the culmination of the whole experiment. In reality, the digital organisms were only free to pursue this set of predefined goals: in the absence of a reward landscape much larger and more complex than the strict goals pursued, the teleological fact itself constrains the experiment strongly enough to obtain the desired results in a simple and unrealistic way.
But there is a more subtle aspect of implicit constraint in this model of experimentation: a mutation is seen as changing one basic instruction into another of the 26 basic instructions within the code of a program. But in each of the 26 cases, something that made sense is transformed into something else that still makes sense, in the programming language of the software. In biological reality, however, things are far from similar: when something biologically “meaningful” mutates into something else, that something else is overwhelmingly likely to make no biological sense at all, because the mutations are mostly deleterious—in a way that the simulation in Avida completely fails to “simulate.”
And even if all these arguments were invalid, and EQU were indeed a legitimate example of functionally complexified information through evolution, we are still talking about a trivial operator by any standard. Furthermore, if EQU is evolutionarily legitimate, then where are the additional examples of increasingly complex, truly impressive functions that Avida has evolutionarily created over the past 16 years[3]—an enormous amount of time in computer science in which not much has happened.
In fact, there’s another explanation that has to do with probability dynamics and information entropy: Put simply, when an agent acts randomly on an organised system, as random mutations do in real biology or in the case of avidians, the probability that the level of organisation of the system will decrease (i.e. break down) increases exponentially with its degree of organisation.
In other words, the more complex a mechanism is, the more likely it is that a random action on it will disrupt its functions.
The converse is also true: the simpler a mechanism is, the more likely it is that random action and chance will not only NOT break it down, but will actually make it a little bit better.
Putting these two observations together, it is only to be expected that evolutionary processes based on random mutations can produce some simple mechanisms, but the more we try to make them more complex through random mutations, the more we are likely to hit a wall of impossibility. This is because as we increase the complexity of the mechanism, the next random actions exponentially increase the probability of breaking the mechanism and decrease the probability of improving it. So it would be theoretically possible for simple operators like EQU to be obtained by evolutionary processes, and at the same time it is perfectly explicable why the last sixteen years have not produced results more complex than this: it is beyond the power of evolution to produce anything but the simplest mechanisms!
Conclusions
The simulations discussed are teleological, which means they are based on the existence of an ultimate goal of evolution, and this is inconsistent with evolutionary theory and also the source of unrealistic constraints on the simulated process. In addition, the moderation of natural/artificial selection and random mutations act as additional levels of constraint on the simulation on what is in reality a very narrow path. Under these conditions, it is neither surprising nor rewarding to obtain results that appear to conform to evolutionary expectations. In fact, simulations are only successful to the extent that the implicit and explicit constraints under which they operate anticipate and guide evolution from the outset.