3 minute read

Bioinformatics and Computational Biology



Bioinformatics, or computational biology, refers to the development of new database methods to store genomic information, computational software programs, and methods to extract, process, and evaluate this information, and the refinement of existing techniques to acquire the genomic data. Finding genes and determining their function, predicting the structure of proteins and RNA sequences from the available DNA sequence, and determining the evolutionary relationship of proteins and DNA sequences are also part of bioinformatics.



The genome sequences of some bacteria, yeast, a nematode, the fruitfly Drosophila, and several plants have been obtained during the past decade, with many more sequences nearing completion. Although work still continues in order to refine the data, the initial sequencing of the human genome was completed in 2000. In addition to this accumulation of nucleotide sequence data, elucidation of the three-dimensional structure of proteins coded for by the genes has been accelerating. The result is a vast ever-increasing amount of databases and genetic information The efficient and productive use of this information requires the specialized computational techniques and software. Bioinformatics has developed and grown from the need to extract and analyze the reams of information pertaining to genomic information like nucleotide sequences and protein structure.

Bioinformatics utilizes statistical analysis, stepwise computational analysis and database management tools in order to search databases of DNA or protein sequences to filter out background from useful data and enable comparison of data from diverse databases. This sort of analysis is on-going. The exploding number of databases, and the various experimental methods used to acquire the data, can make comparisons tedious to achieve. However, the benefits can be enormous. The immense size and network of biological databases provides a resource to answer biological questions about mapping, gene expression patterns, molecular modeling, molecular evolution, and to assist in the structural-based design of therapeutic drugs.

Obtaining information is a multi-step process. Databases are examined, or browsed, by posing complex computational questions. Researchers who have derived a DNA or protein sequence can submit the sequence to public repositories of such information to see if there is a match or similarity with their sequence. If so, further analysis may reveal a putative structure for the protein coded for by the sequence as well as a putative function for that protein. Four primary databases, those containing one type of information (only DNA sequence data or only protein sequence data), currently available for these purposes are the European Molecular Biology DNA Sequence Database (EMBL), GenBank, SwissProt and the Protein Identification Resource (PIR). Secondary databases contain information derived from other databases. Specialist databases, or knowledge databases, are collections of sequence information, expert commentary and reference literature. Finally, integrated databases are collections (amalgamations) of primary and secondary databases.

Computer monitor of automated DNA sequencer with gel image. Photograph by T. Bannor. Custom Medical Stock Photo.

The area of bioinformatics concerned with the derivation of protein sequences makes it conceivable to predict three-dimensional structures of the protein molecules, by use of computer graphics and by comparison with similar proteins, which have been obtained as a crystal. Knowledge of structure allows the site(s) critical for the function of the protein to be determined. Subsequently, drugs active against the site can be designed, or the protein can be utilized to enhance commercial production processes, such as in pharmaceutical bioinformatics.

Bioinformatics also encompasses the field of comparative genomics. This is the comparison of functionally equivalent genes across species. A yeast gene is likely to have the same function as a worm protein with the same amino acid. Alternately, genes having similar sequence may have divergent functions. Such similarities and differences will be revealed by the sequence information. Practically, such knowledge aids in the selection and design of genes to instill a specific function in a product to enhance its commercial appeal.

The most widely known example of a bioinformatics driven endeavor is the Human Genome Project. Work related to the Human Genome Project has allowed dramatic improvements in molecular biological techniques and improved computational tools for studying genomic function.

Additional topics

Science EncyclopediaScience & Philosophy: Bilateral symmetry to Boolean algebra