Cells, Chromosomes, and Genomes | Introduction to Genomics for Engineers
Skip to main content
On this page<br>Introduction<br>Disclaimer<br>This Guide is written specifically by and for computer scientists and<br>engineers. The underlying biology in cancer genomics can be exceedingly complex<br>and requires years of study. Making the content palatable requires drawing<br>abstractions around these concepts. This guide should be treated as an<br>introduction to the domain that teaches our audience the material in a<br>"broad-strokes" fashion. Please be forgiving if you feel we have glossed over<br>your favorite quirk of cancer genomics. Further, everything within the guide is<br>presented within a research context and may not be relied on in making decisions<br>about patients. If you feel anything has been stated incorrectly, you can file<br>an issue on the Github repository. Please see the LICENSE.md<br>for more information.
info<br>Everything in this guide refers to eukaryotic molecular biology, which is the<br>study of organisms whose DNA is enclosed within a nucleus. Broadly speaking,<br>most familiar species are eukaryotes, except bacteria which has DNA spread<br>throughout the cell. At times, this document will be geared towards the<br>sequencing of human cells specifically.
Cells are the smallest unit of life and are the building blocks of organisms, from a<br>single-celled bacteria to the trillions of cells that make up the human<br>body.<br>Cells are complicated organized structures that take a variety of forms, forming tissues<br>and organs and completing the body's functions.<br>Within nearly every cell is a genome . A genome is the complete inherited instruction<br>set for producing, operating and maintaining a living cell or organism. This information<br>is physically encoded in a molecule called deoxyribonucleic acid or DNA . Among other<br>things, DNA contains instructions for the assembly of tens of thousands of different<br>molecular products. These instructions (or recipes) are called genes , and the<br>physical, molecular products genes encode for are called proteins . Cells are<br>constantly reading and interpreting genes stored within the DNA in order to assemble<br>various proteins. Each cell type in the body produces a complex ecosystem of proteins<br>that keep the cell alive and executing its specific function.<br>Bakery Analogy<br>To illustrate this phenomenon, imagine the cell as a bakery that makes many different<br>types of cakes. In this analogy, the genome stored within the DNA is the master recipe<br>book containing more than 20,000 different cake recipes (genes). The physical cakes that<br>are made from these recipes are the proteins. Notably, there are a limited number of<br>copies of the recipe/gene (two copies in the normal case for humans), but you may make<br>thousands or more physical cakes from those recipes. Depending on the type of cell, the<br>mixture of different cake flavors, their quantities, and how they interact together will<br>be different.<br>tip<br>Keep an eye on this analogy—we will refer back to and build upon it a number of times<br>during the course of this guide.
A Mental Model for DNA<br>Conceptually, you can think of DNA laid end-to-end as a ~3 billion character long string<br>consisting only of 'A's, 'C's, 'T's and 'G's. This string and any substring contained<br>within are commonly referred to as genomic sequences . These characters represent the<br>physical Adenine, Guanine, Thymine, and Cytosine bases (or nucleotides )<br>respectively.<br>Importantly, though it's easy to conceptualize DNA as a single, very long<br>string, the reality is more complex. DNA is comprised of two complementary sequences<br>known as strands . Each base is actually a member of a base pair , whereby<br>nucleotides complement each other uniquely—'A's pair only with 'T's, and 'G's only with<br>'C's. Up close, this structure resembles a spiral staircase as seen in the figure below.
When cells divide, the spiral unwinds; each base pair is split; and the molecule is<br>split into two strands, each one containing the information needed to replicate the<br>original DNA structure. Normal healthy cells then copy the genetic code very accurately,<br>rarely introducing variation.<br>Physical Structure<br>In plants and animals, DNA is broken up into a number of large sequences<br>called chromosomes that are tucked into the nucleus. Chromosomes typically<br>come in pairs (one from your father and one from your mother) and are wrapped<br>around proteins called histones . These histones keep the DNA string tightly<br>packaged and help control which gene products are made in a given cell. For humans,<br>there are normally 22 pairs of autosomes (chromosomes shared by both sexes)<br>and a pair of sex chromosomes (XX for females or XY for males), totaling 23 pairs<br>of chromosomes. Autosomes are numbered from 1 to 22 based on size, arranged from<br>largest to smallest. The full set of chromosomes makes up the genome.<br>Conclusion<br>The genome is a vast search space for biological questions. Each genome is a biochemical<br>database that, if properly accessed, can inform how our...