6 minute read

Human Genome Project

The Goals Of The Human Genome Project, Dna Sequencing Methodology, The Draft Sequence, The Dna Sequence: Is It Informative?The timeline

The United States Human Genome Project (HGP) is an initiative formally launched in 1990 by the National Institutes of Health (NIH) and the U.S. Department of Energy (DOE) to better understand all aspects related to human genetic material, or deoxyribonucleic acid (DNA). DNA represents a genetic alphabet and the specific sequences that are part of DNA called genes code for various proteins by virtue of the DNA sequence that makes up an organism's genome. The DNA alphabet consists of four letters (A for adenine, T for thymine, C for cytocine, and G for guanine) called nucleotides. This DNA sequence is found in the nucleus of almost every cell in the body. The initiative of the HGP has been to completely sequence the human genome, create databases to categorize this information, and use it for medical, research, and educational purposes.

Around the time that the HGP was formally introduced, it was an issue of debate whether it would be more important to know the complete sequence of the genome, or whether known sequences should be annotated (functionally characterized) before further sequences were determined. The scientific approach to identifying and defining the function of genes and to determine how genes interact is a field of genetics called functional genomics. Structural genomics is a field of genetics focused on determining the location of a gene by an approach called genetic mapping, or the localization of genes with respect to each other. In the end, functional genomics became secondary to sequencing the human genome, but functional genomics is now the focus of what is called the post-genomic era. The issue was resolved in this manner mainly because functional genomic-based studies are time consuming and require a more challenging experimental design compared to direct DNA sequencing.

The first decade

In as early as 1983, scientists at Los Alamos National Laboratory (LANL), a Department of Energy Laboratory, and Lawrence Berkeley National Laboratory (LBNL) were working to begin the production of what are called DNA libraries. DNA libraries allow scientists to categorize different DNA sequences so that they can piece together the continuous sequence for each chromosome. Only two years later, the feasibility of the Human Genome Initiative was carefully being considered. In 1986, the Department of Energy and the Office of Health and Environmental Research announced a $5.3 million pilot project to begin the Human Genome Initiative in order to develop resources and technologies that would improve this effort. In 1987, the Health and Environmental Research Advisory Committee recommended a 15-year goal to map and sequence the entire human genome, the first undertaking ever to be made.

In 1988, the Human Genome Organization was founded in order to provide international collaborative opportunities for scientists. In 1989, the ELSI Task Force was created. An official and formal, five-year joint agreement between NIH and DOE was presented to Congress in 1990 along with a 15-year goal to sequence the entire human genome. Already artificial chromosomes were being created that would give scientists the ability to insert large DNA sequences into these constructs. In particular, bacterial artificial chromosomes (BACs) were being produced that allowed larger fragments to be inserted, accelerating sequencing efforts. Inserts of human DNA into BACs represent a type of DNA library. In 1991, a repository called the Genome Database was created, marking the first major computational effort to begin teasing out the complex genetic material that separates humans from other organisms. In 1992, only two years after the HGP formally began, the first crude map of the human genome was published using sequence data acquired by linking various genes together based on known locations (or markers) along a chromosome. This gave the research community a glimpse into the human genome map.

The next ten years: public and private contributions

In 1993, an international consortium was established to sort out sequences derived from expressed genes and efforts to map theses sequences ensued. This consortium was called the Integrated Molecular Analysis of Gene Expression (IMAGE) Consortium and it paved the way for structural and functional genomics. Novel sequencing methodology was being developed almost as rapidly as the DNA sequences were elucidated. A new artificial chromosome vector called YAC (yeast artificial chromosome) was introduced providing a construct with an even larger DNA insert capacity.

In 1994, the HGP announced the completion of the five-year goal of producing the genetic map of the human genome one year earlier than proposed. Each chromosome had an expanding DNA library resource. In the same year, the first legislation to be passed initiated by the U.S. HGP and called the Genetic Privacy Act was designed to control how DNA is collected, analyzed, stored, and used.

The physical maps of chromosomes 16 and 19 were announced in 1995, followed by the publication of moderate-resolution maps of chromosomes 3, 11, 12 and 22. During this time, the HGP was not the only species that was being sequenced. Already, the genome from the bacteria that causes the flu (Haemophilus influenzae) was completely sequenced, followed by the yeast genome (Saccharomyces cerevisiae) a year later. Concerns over discrimination based on genetic information elicited a amendment to the Health Care Portability and Accountability Act that included a clause that prohibits healthcare insurance companies to use of genetic information in certain cases to determine eligibility. This was an important legislative initiative, helping to mitigate some of the immediate concerns related to genetic discrimination of the healthcare industry.

In January, 1997 the NIH declared that the National Human Genome Research Institute (NHGRI) would be a recognized collaborative institute. Following this decree, physical maps of chromosomes X and 7 were announced. GeneMap of 1998 was released allowing scientists the ability to use the mapped location of approximately 30,000 markers for genetic studies. It was also in 1998 that American geneticist Craig Ventor formed Celera Genomics, a company that would significantly contribute to the sequencing effort using many resources provided by the HGP. Celera Genomics, equipped with high-speed state of the art sequencing capabilities, became a leader in the race to sequence the human genome. Only nine years after the HGP was formally initiated, chromosome 22 was considered to be completely sequenced, meaning that although the 56 million bases that are estimated to makeup the entire sequence of chromosome 22, only 33.5 million bases were actually sequenced by the HGP. The remaining sequences, roughly 22.5 million bases, represent regions at the ends of chromosomes (called telomeres) and the center of chromosomes (centromeres) are comprised of repeated sequences that prevent them from being cloned into BACs or any other construct. There are few genes, if any, in most of these sequences. The sequenced portion of chromosome 22 represents 97% of regions that are rich in genes.

Using the sequencing data, a public database created by major pharmaceutical companies called the Single Nucleotide Polymorphism (SNP) Consortium was introduced in order to provide information about inherited variations in the human genome that might provide insight into health and disease. For example, inherited variations in genes that metabolize carcinogens might have inherited variations in some individuals that makes them susceptible to developing cancer when they are exposed to environmental contaminants. People without these variations are therefore less likely to develop cancer. Identifying these individuals has important implications for reducing cases of cancer.

The success of the HGP was celebrated in the year 2000 when the draft of the human genome was announced. An executive order issued by President Bill Clinton mandated that federal agencies were prohibited from using genetic information for employment decisions or staff promotions. Also in this year, the second chromosome to be sequenced, chromosome 21, was announced and the draft of 5,16, 19 were also finished followed by chromosome 20 in 2001. The working draft of the HGP was published by the journals nature and science. The Science article depicted work performed by Celera Genomics and the nature article represented data derived from the efforts of the public sector. Less than a year later, the Mouse Genome Sequencing Consortium published its own draft sequence of the mouse genome on December 5, 2002 in the journal Nature. In January 2003, chromosome 14 became the fourth chromosome to be entirely sequenced. Having the sequence of both mouse and humans helps scientist understand human diseases by developing mouse models and identifying genes that are homologous (the same) and might have similar functions in both organisms.

Additional topics

Science EncyclopediaScience & Philosophy: Heterodyne to Hydrazoic acid