DNA Through the Eyes of a Coder

widowlark1 pts0 comments

DNA seen through the eyes of a coder (or, If you are a hammer, everything looks like a nail) - Bert Hubert's writings

Skip to Main Menu

Updates:<br>12th of September 2021: I&rsquo;m writing a book on DNA! If you want to become a<br>beta reader, or have suggestions, I&rsquo;d love to hear from you!

8th of January 2021:<br>This article has been revised and updated, scientifically and in terms of dead links. Revision made by Tomás Simões (@putadagravidade / tomasprsimoes@gmail.com). Feel free to contact me if I made a mistake.

25th of August 2017:<br>This page has led to a two-hour presentation called DNA: The code of Life as presented at SHA 2017. Includes slides and video and a summarizing blogpost. If you like this page, you&rsquo;ll love the presentation.

This is some rambling by a computer programmer about DNA. I&rsquo;m not a molecular geneticist<br>(Update: 20 years after starting this post, I can fake it reasonably well.<br>This page was started somewhere in 2001, and it may need some more updating<br>here and there. Since 2001 I&rsquo;ve learned a few things and I think I need to<br>revisit some parts of this page.)

If you spot mistakes, please contact me (@bert_hu_bert / bert@hubertnet.nl).

I&rsquo;m not trying to force my view unto the DNA - each observation here is quite &lsquo;uncramped&rsquo;. To see where I got all this from, head to the Bibliography (end of the page).

The source code

Is here. This not a joke. We can wonder about the license though. Maybe we should ask the walking product of this source: Craig Venter (update: not quite true, it is mostly someone else). The source can be viewed via a wonderful set of perl scripts called &lsquo;Ensembl&rsquo;. The human genome is about 3 gigabases long, which boils down to 750 megabytes. Depressingly enough, this is only 3.6 (update: used to be 2.8, apparently Firefox decreased in size, huh.) Mozilla browsers.

DNA is not like C source but more like byte-compiled code for a virtual machine called &rsquo;the nucleus&rsquo;. It is very doubtful that there is a source to this byte compilation - what you see is all you get.

Illustration of a DNA molecule.

The language of DNA is digital, but not binary. Where binary encoding has 0 and 1 to work with (2 - hence the &lsquo;bi&rsquo;nary), DNA has 4 positions, T, C, G and A.

Whereas a digital byte is mostly 8 binary digits, a DNA &lsquo;byte&rsquo; (called a &lsquo;codon&rsquo;) has three digits. Because each digit can have 4 values instead of 2, a DNA codon has 64 possible values, compared to a binary byte which has 256.

A typical example of a DNA codon is &lsquo;GCC&rsquo;, which encodes the amino acid Alanine. A larger number of these amino acids combined are called a &lsquo;polypeptide&rsquo; or &lsquo;protein&rsquo;, and these are chemically active in making a living being.

See also https://www.nature.com/scitable/definition/codon-155/

Position Independent Code

Dynamically linked libraries (.so under Unix, .dll on Windows) code cannot use static addresses internally because the code may appear in different places in memory in different situations. DNA has this too, where it is called &rsquo;transposing code&rsquo;:

Nearly half of the human genome is composed of transposable elements or jumping DNA. First recognized in the 1940s by Dr. Barbara McClintock in studies of peculiar inheritance patterns found in the colors of Indian corn, jumping DNA refers to the idea that some stretches of DNA are unstable and &ldquo;transposable,&rdquo; ie., they can move around – on and between chromosomes.

https://www.nature.com/scitable/topicpage/transposons-the-jumping-genes-518/

Conditional compilation

Illustration of human chromosomes.

Of the 20,000 to 30,000 genes now thought to be in the human genome (update: quite debatable), most cells express only a very small part - which makes sense, a liver cell has little need for the DNA code that makes neurons.

But as almost all cells carry around a full copy (&lsquo;distribution&rsquo;) of the genome, a system is needed to #ifdef out stuff not needed. And that is just how it works. The genetic code is full of #if/#endif statements.

This is why &lsquo;stem cells&rsquo; are so hot right now - these cells have the ability to differentiate into everything. The code hasn&rsquo;t been #ifdeffed out yet, so to speak.

Stated more exactly, stem cells do not have everything turned on - they are not at once liver cells and neurons. Cells can be likened to state machines, starting out as a stem cell. Over the lifetime of the cell, during which time it may clone (&lsquo;fork()&rsquo;) many times, it specializes. Each specialization can be regarded as choosing a branch in a tree.

Each cell can make (or be induced to make) decisions about its future, which each make it more specialized. These decisions are persistent over cloning using transcription factors and by modifying the way DNA is stored spatially (&lsquo;steric effects&rsquo;).

A liver cell, although it carries the genes...

rsquo lsquo code cells called page

Related Articles