Sequencing of the human genome has led to an explosion of new data on the relationship between genetic variation and disease. These data cover many areas: Rare Mendalian diseases such as cystic fibrosis, sickle cell anemia and many others, usually caused by a single mutation that disrupts the function of a protein molecule; Risk of common diseases such asthma, Alzheimer’s, the diabetes, rheumatoid arthritis, often influenced by over 100 genetic variants throughout the genome as well as environmental factors; and cancer, where there are predisposing variants ( for example in BRAC1 and BRAC2 for breast cancer) and many additional mutations that accumulate in cells. This treasure trove of data holds enormous promise for improved diagnosis, prognosis and treatment, as well as many new opportunities for drug development. Indeed, in rare disease and cancer, the data are already arriving in the clinic.
Full exploitation of these data demands understanding of how each genetic variant affects the disease phenotype. For variants that alter the properties of a species of protein molecule, understanding is greatly facilitated by knowledge of protein structure.
At IBBR, we use two strategies to investigate the relationship between genomic variation and disease. First, the development of computational methods to identify molecular level disease mechanisms, and use of these to identify causative variants in wide range of disease situations. We complement the computational studies with a plethora of experimental techniques, including structural studies. In turn, we use the experimental results to validate and extend the computational methods (see figure).
Second, together with colleagues at UC Berkeley, we engage the international community in assessment of the performance of computational techniques used to analyze these Big Data, through a series of community wide experiments – CAGI (Critical Assessment of Genome Interpretation). These experiments are a new paradigm for conducting scientific res
earch, made possible by the advent of electronic communications, and allow participants from around the world to work on a common problem, and to share the results. The method was pioneered at CARB, University of Maryland, with experiments on computational methods for modeling protein structure, and has since been widely adopted throughout computational biology. In CAGI, participants (over 30 groups from around the world in the most recent round) are provided with information on genetic variation related to human disease and asked to determine the corresponding disease phenotype. Challenges span the full range of genetic disease including rare, cancer, and common complex trait disease. Results from these experiments are providing a rigorous basis for evaluating methods and are spurring the development of new and improved algorithms.