5 minute read

Sequencing

Sequencing refers to the biotechnology techniques that determine the order of the genetic material. The genetic material that acts as the blueprint for most cells and organisms is deoxyribonucleic acid (DNA). DNA provides the information to make ribonucleic acid (RNA), which in turn provides the information to produce protein.

The information for all living things is stored in the genetic material that is part of the organism. An apt analogy is that of a book containing information in the form of letters that make up words. When interpreted by reading and comprehending, the letters on the book's pages take on an order. Likewise, an organism's genetic material is a sequence of chemical letters. Without some interpretation, this information is useless. Prokaryotic organisms such as bacteria and more complex, multicellular organisms such as humans have built in systems that determine the information that the genetic material conveys. These systems function to determine the order of the information, or the sequence in which the information is presented.

Humans have also learned to decipher the genetic code by sequencing techniques. As well, the identification and arrangement of the components that make up proteins (amino acids) can be determined by other sequencing techniques.

Knowing the sequence of the genetic material has allowed scientists to determine what stretches of the material might specify proteins, or to detect alterations in the genetic material that might be important in genetic diseases, such as cystic fibrosis or cancer. As well, sequence information allows researchers to specifically change the arrangement of the genetic material (a mutation), in order to determine if the mutation affects the functioning of the cell or organism.

Knowledge of the protein sequence allows researchers to use powerful computers and computer software to study the three-dimensional structure of the protein molecule and to assess how mutations in the protein sequence affect the shape and function of the protein. Also, the shape of a protein is important in designing chemicals like antibiotics that will specifically target the protein and bind to it.

DNA sequencing determines the order of the compounds that make up the DNA. These compounds are called bases. There are four bases; adenine, thymine, guanine and cytosine.

Beginning in the early 1990s and culminating about a decade later, the best-known example of sequencing has been the effort to sequence the human genome. The human genome is the genetic material that is carried in human cells.

In the laboratory, the sequencing of DNA is done by allowing the manufacture of DNA to begin and then stopping the process in a controlled way (i.e., at a certain base at a known location in the DNA). This can be accomplished by two methods. The first method is called the Sanger-Coulson procedure, after its two creators. In the procedure, a small amount of what is termed a dideoxynucleoside base is mixed into the solution that contains the four regular bases. A dideoxynucleoside base is slightly different in structure from the normal base and is also radioactively labeled. When the radioactive base is added on to the growing DNA chain, the next regular base cannot be attached to it. Thus, lengthening of the DNA stops. By using four different dideoxynucleotides that are structurally different from the four regular bases, a pattern of DNA interruption occurs as a number of experiments are done. This produces DNA pieces of many different lengths that have all begun from the same start point. The different pieces can be visualized using the technique of gel electrophoresis, and the jig-saw puzzle pattern of different lengths can be sorted out to deduce the base sequence of the original DNA.

The second DNA sequencing technique is known as the Maxam-Gilbert technique, once again after the scientists who pioneered the technique. Here, both strands of the double-stranded DNA are labeled using radioactive phosphorus (phosphorus is an element that is makes up part of the four bases of DNA). The DNA is heated, which causes the two strands to separate from one another. Both strands are then cut up into a number of shorter pieces using specific enzymes. The differently sized fragments of each DNA strand can be separated using gel electrophoresis, and the resulting patterns determine the sequence of each DNA strand.

The Sanger-Coulson method has been modified so as to be done using automated DNA sequencing machines. This enables DNA to be sequenced much faster than is possible manually.

A sequencing method called shotgun sequencing was successfully used as one approach to sequence the human genome. In shotgun sequencing, the use of a variety of enzymes that cut DNA at different and specific sites produces hundreds or thousands of random bits. Each small stretch of DNA is automatically sequenced and then powerful computers piece back together the information to generate the entire DNA genome sequence.

Protein sequencing determines the arrangement of the amino acids of the protein. This can be done indirectly if the DNA sequence is known. From that sequence, the RNA sequence can be deduced, followed by the sequence of amino acids that the RNA codes for. If the DNA sequence is not known, then the protein sequence can be determined directly, using a chemical approach. The most popular chemical sequencing technique is the Edman degradation procedure. The amino acids are chemically snipped off one at a time from one end of a protein. Each released amino acid can be identified using a technique called reverse phase chromatography. By keeping the identified amino acids in order, the sequence of the protein is determined.

Another protein sequencing technique is called fast atom bombardment mass spectrometry (FAB-MS). Here, the sample is bombarded with a stream of quickly moving atoms. Typically, argon atoms are used. The interaction of the atoms with the protein causes the protein to become charged. When the protein is chemically broken into fragments the charged regions can be used to identify the amino acids. FAB-MS is a powerful technique, although highly specialized and expensive equipment is required.

Another more widely used protein sequencing technique employs a variety of protein degrading enzymes to break up a protein into fragments. The shorter fragments, which are called peptides, can then be sequenced. The enzymes that are used cut the protein into fragments in an overlapping manner. That is, an end of one fragment will have the same information as the end of another fragment. These areas of common information allows researchers to piece the sequence back together to reveal the amino acid arrangement in the intact protein.

Resources

Books

Alphey, L. DNA Sequencing: From Experimental Methods to Bioinformatics. Berlin: Springer-Verlag, 1997.

Graham, C.A., and A.J.M. Hill. DNA Sequencing Protocols. 2nd ed. Clifton, NJ: Humana Press, 2001.

Kinter, M., and N.E. Sherman. Protein Sequencing and Identification Using Tandem Mass Spectroscopy. Hoboken, NJ: Wiley-Interscience, 2000.

Smith, B.J. Protein Sequencing Protocols (Methods in Molecular Biology, V. 211. Clifton, NJ: Humana Press, 2002.

Brian Hoyle

Additional topics

Science EncyclopediaScience & Philosophy: Semiotics to Smelting