Development is an Art

21 Feb 2009

Bioinformatics

In our day to day life, lot of times we just say ‘hard work is in my Genes’, ‘Technology is our DNA’.  Genes, DNA; these words are familiar to us but do we know what actually they mean? Many of us, in fact all of us must have studied about Genes, DNAs & RNAs in our school days, but there are very little chances that we all remember it. I know being software professional no one expects us to know these things. But there is a vertical in software industry where the software professional is ought to know about these things & the vertical is most popularly known as Bioinformatics. 

In simple English, Bioinformatics is the branch of life science that deals with the study of application of information technology to the field of molecular biology.  The primary goal of bioinformatics is to increase our understanding of biological processes. Bioinformatics entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Common activities in Bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures.

Major research efforts in the field of Bioinformatics include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, and the modeling of evolution.

Bioinformatics was applied in the creation and maintenance of a database to store biological information at the beginning of the "genomic revolution", such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.

In order to study how normal cellular activities are altered in different disease states, the biological data must be combined to form a comprehensive picture of these activities. Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include: a) the development and implementation of tools that enable efficient access to, and use and management of, various types of information. b) the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences.

 

Software in bioinformatics

There is lot of research is being done in the field of Bioinformatics where the areas of research are Sequence Analysis, protein modeling, Genes prediction, protein- protein docking, drug discovery etc. In all these fields software play major roll. E.g. in sequence analysis, computer programs are used to search the genome of thousands of organisms, containing billions of nucleotides. These programs would compensate for mutations in the DNA sequence, in order to identify sequences that are related, but not identical. One of the most popular tools for sequence analysis ‘BLAST’ is an algorithm for determining the similarity of arbitrary sequences against other sequences.

Another aspect of bioinformatics in sequence analysis is the automatic search for genes and regulatory sequences within a genome. Not all of the nucleotides within a genome are genes. Within the genome of higher organisms, large parts of the DNA do not serve any obvious purpose. Research on these areas gave birth to a new algorithm in computer science known as Genetic algorithm, a search technique used in computing to find exact or approximate solutions to optimization and search problems.

SOAP and REST-based interfaces have been developed for a wide variety of bioinformatics applications allowing an application running on one computer in one part of the world to use algorithms, data and computing resources on servers in other parts of the world. The main advantages lay in the end user not having to deal with software and database maintenance overheads. Basic bioinformatics services are classified by the EBI into three categories: SSS (Sequence Search Services), MSA (Multiple Sequence Alignment) and BSA (Biological Sequence Analysis). The availability of these service-oriented bioinformatics resources demonstrate the applicability of web based bioinformatics solutions, and range from a collection of standalone tools with a common data format under a single, standalone or web-based interface, to integrative, distributed and extensible bioinformatics workflow management systems.

There are other software technologies like OPEN GL, image processors are used for modeling, protein\gene image processing etc.

Considering the new areas of research in bioinformatics and the new technologies being developed in computer science, there will be a great deal of opportunities where computer science can aid the bioinformatics.

No comments: