Thursday, 23 September 2010

Aspergillus Genomics and Metabolic Pathways - AsperCyc

Genomics is the study of the  effect of genes and gene expression on the entire genome. Originating in the1970's the huge advances in DNA sequencing techniques, in particular sequencing carried out by robots in the last 10 years or so at huge dedicated centres has led to large numbers of organisms having their entire genomes sequenced, including humans.
There are thousands of bacteria and virus genomes sequenced as they are small and simple in terms of genome size. It is a far larger task to sequence the genome of fungi, plants and animals - collectively referred to as eukaryotes - so fewer of these have been sequenced but even there the process is accelerating.

We have sequence information for the whole genomes of at least 9 species of Aspergillus (CADRE and aspGD) with many more planned. This means that we have access to the entire book of life for each species, each consisting of 7 or 8 volumes and a total of 30 million 'letters' (base pairs) or 10 million 'words' (codons). To give you an idea of how big that is, the combined plays of Shakespeare number 39 in all with a total of less than 1 million words. The Bible totals less than 600 000 words. We are storing enough information to fill 150 Bibles 0r 100 collections of the plays of Shakespeare!

Just as books looked at word by word are less informative that when we read whole pages, codons are not particularly informative unless we can link them together into functional genes. Aspergillus codons form nearly 30 000 genes of which 10 000 are known to be in use - already we have a big piece of information that we could only guess at before genome sequencing. 20 years ago scientists would have looked at an expressing gene, cut it out of the genome, sequenced it and characterised its expression. All of that could have taken several years to complete. Even then we would have only gathered information about that particular gene and often only the parts of the gene that are actually expressed - little was known about how neighbouring DNA sequences influence the expression of that gene.

Now we already have all of the gene information we need for every gene in an organism and it is stored in the context that it is stored in in our cells so we can look at all of the sequences either side of the gene - and this is all at the tips of our fingers in freely available computer databases. We can now look for sequences either side of a gene that are known to control the expression of similar genes, we can locate unique sequences and extract whole genes for examination very quickly, and we can even construct genes completely artificially if we need to - this is technology beyond the wildest dreams of 30 years ago  which along with other similarly revolutionary techniques is massively speeding up the accumulation of knowledge.

Massively important though all this is, we have still only begun to look at single pages in the genome 'book', much more information is available. Single genes are only capable of one function each. Most of the substances that our cells make that enable us to stay alive are complex and are built up from other materials. Substrates have to be broken down and then rebuilt into a useful form and no gene is capable of doing that on its own. Several, sometimes dozens of genes must be expressed and their products utilised sequentially, each one providing the means for another step in pathways that can be many steps long. These are known as Metabolic Pathways.

Genome information and pathway information are currently in the process of being put together for each sequenced organism (MetaCyc, BioCyc). This is the equivalent af starting to put the pages of our genome 'book' together into chapters. We are helping this process for Aspergillus by the introduction of AsperCyc. So far we are largely relying on computers identifying potential pathways as there is such a huge amount of data to be processed but there is a slow process of manual curation ongoing where a human will cross check what the computer has decided with what is known through published papers.

This is the next step in the ultimate dream of being able to give a computer a genome sequence (or even some DNA) and then stand back as it detects the genes, assesses their potential for expression, assigns them to pathways, and then runs a full simulation of how all of the genes interact with each other to form a living cell. We could then introduce changes to gene expression and watch the consequences, discover new drug targetsand ultimately test new drugs. To continue the analogy we would be reading the whole genome 'book' and analysing the full meaning of the contents. This could have innumerable benefits, for example if someone has a genetic disorder we would be able to work out how to counter the effects of the mutation as we would have a full picture of all of the effects of that disorder presented to us by the computer.

There is some way to go yet but progress is being made and computer power is getting cheaper all the time - adopt an optimistic attitude it is not too difficult to see a positive conclusion!

No comments:

Contact us at