It is commonly assumed that at least 96% of the genome of a multicellular organism has no functional role. A junk-dominated genome is the ideal place for evolution to randomly explore new functions and characteristics of the species. After all, who would ever think that a wise and all-knowing Creator would use 96% informational junk to define a human being or a mollusc?

On 14 April 2003, the Human Genome Project announced the first complete sequencing of the human genome. That project focused on the genome of a single person, but now a project called “1,000 Genomes” has achieved the feat of sequencing the genomes of more than 1,000 people. Valuable details of the genetic variability characteristic of the human species have been obtained, and research continues to progress.

These major scientific achievements have enabled an explosion of new technologies and discoveries in molecular biology and have raised great hopes for the diagnosis and treatment of many human diseases. However, it is important to understand that genome sequencing projects, although supported by increasingly powerful and sophisticated technologies, only “read” the letters that make up an individual’s genome. “Transcription”[1] would therefore be an easier word to understand than “sequencing” and much more appropriate than “decoding”, because understanding the processes and information encoded by DNA sequences is a completely different challenge, about which current knowledge is still in its infancy.

A simple analogy would be that a genome is like the technical manual of a passenger aircraft, which gives us a complete description of what the aircraft is made of and how it works, but is unfortunately written in an unknown foreign language. If you do not know the language in which the book is written, it will of course remain largely a mystery. You can make out the structure of the chapters, or even get a general idea of what certain illustrations mean, but a thorough understanding of the book is impossible. As mentioned above, genome sequencing projects merely transcribe the information of an organism’s technical book from the chemical format of DNA molecules into a digital format in the form of a long string of letters[2] that can be easily parsed by humans and computers. But the language of this book remains largely unknown, and the genetics and molecular biology communities are working hard to understand it.

The dilemma of non-functional DNA

Our limited understanding of the language in which genetic information is expressed is nevertheless sufficient to create a dilemma of great importance in the creation-evolution controversy over the origin of life. Let’s look at an example:

What would you say if, in the analogy of the technical book on aeroplanes, we discovered that over 96% of the pages of this book actually contain random letters of the alphabet rather than words, sentences, and phrases with precise meanings in this as yet unknown language? For a book written by clever designers, it would be surprising, to say the least, to find such a large proportion of useless information. On the other hand, if this technical book were not the result of intelligent designers, but the product of natural evolutionary processes, then the informational junk would need no explanation. Moreover, those evolutionary processes would indeed require a dominant proportion of informational junk as raw material for evolution to work optimally, as we will explain later.

This is the crux of the non-functional DNA controversy: according to the current understanding of the human genome, only a maximum of 4% of the genome “does” something, while the remaining 96% is considered by exclusion to be non-functional, like random letters in the human technical book. If this were the case, this observation, which holds true for most complex multicellular organisms, would be a terribly powerful argument in favour of the theory of evolution and against the idea that an intelligent Creator, who did not work through evolution, is at the origin of life.

Functional versus non-functional

To assess this matter properly, we need to understand more about the structure of the genome and what is meant by functional and non-functional. Although the analysis will focus on the human genome—the most intensively studied—the arguments are generally valid for most complex multicellular organisms.

Ever since the Augustinian monk Gregor Mendel discovered the principles of heredity through various experiments on plants in the 1860s, it has been known that certain characteristics are passed from parents to offspring in distinct and indivisible units, which over time have come to be called genes. In the years immediately following the discovery of the DNA molecule, as its structure was elucidated, the central role of DNA in heredity became increasingly clear. In 1957, the geneticist Francis Crick formulated the “central dogma of molecular biology”, the cornerstone of modern genetics, which explained the relationship between DNA, RNA and proteins in the functioning of a cell. Together with the discovery of the “genetic code”, for which a Nobel Prize was awarded in 1968,[3] we now have all the elements to understand what “functional” DNA means.

If we imagine the DNA in the human genome as a string of 3 billion letters, genes can be visualised as small[4] compact sub-sequences of letters that do not overlap and are scattered, apparently at random, throughout the genome, making up 3-4% of the whole[5]. A gene encodes certain characteristics of the organism through its product, called its “function”, and this product always consists of one or more types of protein. Proteins are molecular machines that perform most of the cell’s internal functions, but can also be exported for other functions.

For example, the colour of a person’s skin is determined by the type and amount of a pigment protein called melanin. Melanin is produced by special cells in the skin called melanocytes. These cells “read” from their own genome the DNA subsequence of the gene that describes how melanin is produced and, through some internal processes, transcribe this information from the DNA format into a similar intermediate format, an RNA sequence. The RNA molecule is in turn processed by other internal machinery to produce a protein sequence, which is then concentrated into a protein, in our case melanin, as the end product of the whole process.

The central dogma I mentioned above basically refers to this process of converting information from DNA into RNA and then into protein. The final step, the translation of RNA into protein, is governed by the genetic code I mentioned earlier. Since the function of the cell is to make proteins, and proteins are made by genes, the DNA occupied by genes is called “functional” DNA. This functional DNA is complemented by additional DNA sequences that play a detectable role in regulating the expression of a gene, namely in gene control regions. These regions play a crucial role in describing the conditions under which, and the levels at which, a gene should be expressed in its protein. All the other DNA in the genome, which lies outside the genes and their control regions, forms what is known as “non-functional” DNA.

In summary, the definition of functional DNA is quite strict and influenced by the historical factors of genetic evolution. These factors focus almost exclusively on the mechanisms and chemistry of protein production by genes, with the general idea that these proteins are the ultimate purpose of the cell and the only actors that really matter in cellular processes. While this model is certainly close to reality, we now know that the cell is much more complex than that. However, measured in this way, the functional DNA of a human being is, even by the most generous estimates, less than 4% of the total genome. This percentage threshold applies to most multicellular organisms.

The evolutionary perspective

On the face of it, the ratio of functional to non-functional DNA is perfectly consistent with evolutionary theory. First, large areas of DNA that are thought to play no role in the functioning and development of the organism are seen as a large testing laboratory or “evolutionary playground”. There, random processes of change, which are the raw material of evolution, are free to alter genetic sequences at will, with no penalty to the organism for having no role, until certain sequences happen to have a functional role and are then positively or negatively selected by natural selection.

Secondly, non-functional DNA can also be read as a history book of these evolutionary changes. Unlike functional DNA, which is under constant evolutionary conservation pressure for the simple reason that it has a function, non-functional DNA is freer to change over the long course of evolution. And indeed, it has been observed that whichever two species we consider for comparison, their degree of genetic similarity is much greater on the functional DNA part than on the non-functional DNA part.

Thirdly, non-functional DNA has a certain structure that is not found in the functional sequences that define genes, namely it is clustered with transposable elements and sequences that repeat consecutively. If we look again at the sequence of letters in the genome with the mind’s eye, it is as if we observe that one type of sequence, which can vary in size from a few tens of letters to several tens of thousands of letters, is imperfectly repeated in different places in the genome. These are the transposable elements. Consecutive repetitive sequences are smaller, usually less than 100 letters, and are repeated a random number of times. Evolutionary theory sees these structures as the long history of uncontrolled copying and replication of “selfish” or viral DNA sequences. In other words, certain processes that lead to the duplication or insertion of certain pieces of DNA have acted over time without being corrected by anything, because this DNA is non-functional and therefore organisms survive just as well whatever is there.

Last but not least, from the perspective of the origin of life controversy, evolutionary theory is not bothered by the existence of non-functional DNA, whereas those who support the idea of an intelligent Creator of life may have a big issue with this concept.

These are just some of the reasons why the existence of non-functional DNA is so important to the evolutionary paradigm. From this point of view, it is perhaps not surprising that non-functional DNA was originally pejoratively referred to as “junk DNA”, as a sign of contempt for the idea that this DNA might still be good for something. However, as evidence has accumulated that “junk” is not really useless, this term has gradually been replaced by the more polite “non-functional DNA”.

The information perspective

Let’s try to look at the problem of non-functional DNA from a slightly different angle, that of information. In the previous article we showed how the immense complexity of a human organism is somehow encoded in the 23 pairs of DNA molecules that make up that individual’s genome, the informational equivalent of a digital CD, some 760 megabytes of data. For anyone remotely familiar with computer programming, this size is surprisingly small. As a quick comparison, a fairly old and limited operating system like Windows XP takes up about 1500 megabytes when installed, almost twice the size of the human genome. With all our knowledge of computer science, software design, and complex systems engineering, the human genome would have to be extremely dense in information and interpretation layers to encode the amazing complexity of the human organism in just 760 megabytes. Here are just some of the pieces of information that would have to find their way into the genome:

  • The complete description of every type of protein (over 250,000 estimated types) and the conditions under which these proteins must be produced, used, destroyed, and recycled in every cell.
  • Complete description of each cell type (over 200), intracellular processes, intercellular communication, cell migration conditions, cell life cycle, etc.
  • Full description of each tissue and organ type, their location, and function.
  • Full description of extracellular matrix, organ shape, and body shape.
  • Description of homeostasis (the process of normal functioning of an organism) at any level: molecular, cellular, organ or organism; description of repair and regeneration processes at the same levels.
  • Complete description of the highly complex process of embryonic development, in which the organism builds itself and keeps itself functioning all the time; description of the process of growth and maturation of the organism.
  • Description of all the information with which we are born: instincts, sophisticated ability to interpret images, “programmes” according to which the various organs coordinate themselves, etc.

Using the analogy from the beginning, all this information is written in an unknown language and in unknown places in a human’s technical book, the genome. Although we do not know exactly where and in what form they appear in the genome, it is certain that they must exist in some form, since there is no known storage medium for genetic information other than the DNA molecule.[6] Astonishingly, however, given the challenge of cramming so much information into just 760 megabytes, the current definition of functional DNA implies that only about 4% of it is really important, or just 30 megabytes. This level of information compression seems extremely difficult, if not theoretically impossible.

This surprise at the tiny size of the functional genome was to some extent shared by the biologists involved in the Human Genome Project. Not only was the percentage of DNA considered functional in the human genome found to be very small, but the number of corresponding genes was also surprisingly low compared to initial estimates: about 22,000 genes compared to at least 100,000 expected by specialists.

The functional perspective

The function of a cell can also be considered in a broader sense, beyond the scope of the proteins it produces. We also speak of a cell’s function when we refer to the interactions in which it participates, the intercellular co-ordinations in which it is involved, or the decisions it makes at any given time: when to divide, when and where to migrate, when and what to communicate with other cells, whether and when to change its internal organisation, and so on. All these cellular functions (behaviours, to be precise) are much more difficult to isolate and analyse than protein production per se, and perhaps much more complex than what can be explained using the DNA expression in proteins paradigm.

It is an undeniable observation that it is the processes of intercellular communication, coordination, and action that give rise to tissues and organs and allow the organism as a whole to develop and exist. How can these highly complex and precise processes be controlled and directed by genes and cells whose function is to produce proteins? The current paradigm of the functional and non-functional in the genome argues that these intercellular coordination processes are orchestrated by a symphony of proteins and interactions whose complexity is impossible to comprehend, in which cascades of mutual reactions and conditioning between protein concentrations locally produce intercellular organisation and coordination globally. This paradigm assumes from the outset that the mechanisms are so complex that they are impossible to fully understand and can only be traced by experimental and statistical methods.

But whatever mechanisms control these functions, whether based on cascades of protein-protein interactions or other as yet undiscovered principles, they are necessarily encoded as information somewhere in the genome. But where might this information, which by all estimates must occupy a much larger volume than the information stored in genes, be stored? We know that the 4% of functional DNA is already occupied by genes and their regulatory regions. This leaves only the so-called non-functional area of the genome as a potential location for this vast amount of data. But how can we be sure that the DNA considered non-functional actually plays a specific role in determining the cell behaviour mentioned above? The truth is that, at the current state of knowledge, it is difficult to give a categorical answer, but there are some valuable clues.

Firstly, extensive experience of genetic research in model organisms shows that it is not possible to make major changes to non-functional DNA without serious consequences for the organism, ranging from deformed mutants to completely non-viable organisms. So a certain degree of functionality is already known and accepted for non-functional DNA.

At the same time, however, there are studies in mice in which between 0.0001% and 0.1% of non-functional DNA has been deleted, and no effects have been observed in mature animals. How could this be explained if all the non-functional DNA was indeed functional? There are a number of factors that could explain this result, individually or together, but they remain speculative for now. It is possible that non-functional DNA has a degree of redundancy that allows it to function even when small parts are damaged or deleted. It is possible that the regions affected in the experiments were not expressed in the breeds of mice used in the experiment or in the environmental conditions in which they were kept. It may also be that the animals were affected but not in a way that was readily observable, just as many human genetic diseases are not observable by a superficial look at the patient. Finally, the animals may have been affected only potentially, not actually, in the same way that many human genetic conditions cause a predisposition to certain diseases, but do not guarantee their occurrence.

Most importantly, the experiments only deleted a very small amount of information compared to the size of the genome, and above a certain threshold the effects are usually visible and significant. It is also important to note that evolutionary biologists now shy away from suggesting that non-functional DNA is functionless, and instead accept the possibility of some density of functionality in non-functional DNA.

But perhaps the biggest problem that arises against the possibility of non-functional DNA having dense functionality is the so-called “C-value paradox”. In short, there is no correlation between the amount of non-functional DNA in a species and the apparent complexity of that species. For example, the genome of an ordinary onion is almost five times larger than that of a human, and this is mainly due to the amount of non-functional DNA. And there’s nothing special about onions; we could take many other examples of organisms with huge variations in genome size without a basis in apparent complexity. The issue of the C-value and its implications is a fundamental one, but it will not be dealt with here but in a future article. For now, let us concentrate on the human genome.

The creationist perspective

We are now in a position to provide some answers to the evolutionary perspective on non-functional DNA through the lens of the existence of an intelligent Creator of life.

First of all, if non-functional DNA is actually functional to a significant degree, then the idea that it is a laboratory of evolution comes to a dead end. Evolutionary theory is then left without the aid of the vast and non-functional space in which new and novel functional variants of organismal improvement can be randomly tested without risk.

Secondly, the fact that any two organisms from different species are more similar in the DNA considered to be functional than in the DNA considered to be non-functional does not mean that the non-functional part of the DNA has had more freedom to change over time. This may be a direct consequence of the fact that the proteins produced by the two species are much more similar than their structural and behavioural organisation plans. It is like comparing a motorbike with a car and finding that they are more similar at the level of their chemical building blocks (functional DNA) than at the level of the technical blueprint from which they were built (non-functional DNA).

Thirdly, the fact that non-functional DNA contains transposable elements and consecutive repetitive sequences may simply be due to the structure of the information stored there. Until we know the language in which this information is expressed, this explanation is at least as good as the evolutionary explanation that these structures represent a history of evolutionary change. A perhaps irrelevant but interesting note: the compiled and executable structure of a computer program can exhibit such features without the need for a theory of the evolution of computer programs.

The ENCODE project

In September 2003, the US National Institute for the Study of the Human Genome launched ENCODE, a massive project to investigate the extent to which non-functional human DNA contains as-yet undiscovered functions. Conceived as a continuation of the Human Genome Project, the project announced its findings in September 2012 with a flood of more than 30 papers published simultaneously in the world’s most prestigious scientific journals. These papers detailed the work of thousands of researchers, probably the vast majority of them evolutionists, who had spent over nine years unbiasedly studying what was functional and what was non-functional in the human genome. Their definition of “functional” was, as we have seen, as broad as it should be, and the conclusions they reached sent shockwaves through the scientific community.

If before ENCODE the human genome looked like an aerial photograph of a village in the dark, with only a few glimmers of activity (3%-4%), after ENCODE the picture is almost blindingly bright: at least 80% of the DNA previously thought to be non-functional actually has biochemical functions, directly or indirectly linked to cell activity and gene regulation. Although the scientific community has yet to fully grasp the significance of this result, and the scientific weight of the project that made it public, it is likely to mark a turning point in both molecular biology and the tricky question of the origin of life.

Controversy and conclusions

For several months after the publication of the ENCODE results, the evolutionary community reacted poorly, probably as it processed the implications of the study. But then the storm began: numerous scientific and popular science articles were published criticising everything that could be criticised about the ENCODE results: from the methodology used and the consistency of its application, to the definition of functionality and the logic of the researchers. It was even argued that the human genome could not be 80% functional, because this would contradict several well-known aspects of its evolution. Without going into detail, a careful analysis of the arguments put forward suggests that the main criticism of the project is that it has used a very broad definition of what can be considered functional. But this was openly stated by the research team from the outset and, as we have also shown in this article, such an approach is clearly necessary for both informational and functional reasons.

While the debate within scientific articles is at best tongue-in-cheek, within the confines of scientific discourse, in popular science articles and blogs the debate is more like a boxing match with the gloves off, which says a lot about the heat generated in the evolutionary community by this sensitive issue, which seems to have split it into two camps: those who accept that it is possible for the genome to be largely functional, and those who consider this incompatible with evolutionary theory. But when the dust settles after this general struggle, there will remain indisputable results: if one goes beyond the narrow and traditionally-accepted definition of functional DNA, the human genome is transformed from an almost inert object into an active and functional system, as one would expect from the hand of an intelligent Creator.

Footnotes
[1]“‘Transcription’ in the usual sense. It does not refer to the process of DNA > RNA synthesis.”
[2]“The letters are of four types, usually referred to as ‘A’, ‘C’, ‘G’ and ‘T’, after the initial of the molecule they represent in the DNA chain.”
[3]“Awarded to Robert W. Holley, Har Gobind Khorana, and Marshall W. Nirenberg ‘for their interpretation of the genetic code and its function in protein synthesis’.”
[4]“‘Small’ refers to the size of the genome. The average length of a human gene is 30,000 letters.”
[5]“For the purposes of this article, we will disregard the internal structure of a gene, which consists of introns and exons. For simplicity, a gene is considered as a single exon, and therefore we overestimate the functional DNA at 3%-4% of the genome, and we do not refer only to the 1.5% of the genome that is strictly occupied by the exome. .”
[6]“We are only referring to nuclear DNA. We exclude epigenetic information or mitochondrial DNA without losing generality. Although epigenetic information is inherited, it is still unclear whether and to what extent it is involved in heredity.”

“‘Transcription’ in the usual sense. It does not refer to the process of DNA > RNA synthesis.”
“The letters are of four types, usually referred to as ‘A’, ‘C’, ‘G’ and ‘T’, after the initial of the molecule they represent in the DNA chain.”
“Awarded to Robert W. Holley, Har Gobind Khorana, and Marshall W. Nirenberg ‘for their interpretation of the genetic code and its function in protein synthesis’.”
“‘Small’ refers to the size of the genome. The average length of a human gene is 30,000 letters.”
“For the purposes of this article, we will disregard the internal structure of a gene, which consists of introns and exons. For simplicity, a gene is considered as a single exon, and therefore we overestimate the functional DNA at 3%-4% of the genome, and we do not refer only to the 1.5% of the genome that is strictly occupied by the exome. .”
“We are only referring to nuclear DNA. We exclude epigenetic information or mitochondrial DNA without losing generality. Although epigenetic information is inherited, it is still unclear whether and to what extent it is involved in heredity.”