Harvard University, Ph.D. 1981
The activities of my research group and collaborations focus on the development of bioinformatics systems essential for functional genomics, genetics and phenotypic research. The sequencing of mouse, human and other genomes and the rapid accumulation of very large data sets has resulted in an overwhelming amount of information from multiple sources containing a variety of content and formats. The challenge is to bring all the data together and make it easily accessible to researchers directly and/or for additional computer analysis. Our current research centers on combining bio-ontologies (defined, controlled, structured vocabularies) and database systems to identify molecular elements that contribute to the processes of particular diseases, such as lung cancer. This work is undertaken as part of the Gene Ontology Consortium, a group of 19 model organism databases and genome annotation centers. My group, as part of the Mouse Geneome Informatics Consortiun at The Jackson Laboratory, is responsible for the functional and comparative annotation of mouse genes.
Functional and Comparative Genome Informatics
My research focuses on functional and comparative genome informatics. I work on the development of systems to integrate and interrogate genetic, genomic and phenotypic information. I am one of the leaders of the Gene Ontology (GO) project and I have been deeply involved with the work of the GO Consortium since its inception. The Gene Ontology project is an international effort to provide controlled structured vocabularies for molecular biology that serve as terminologies, classifications and ontologies to further data integration, analysis and reasoning. My interest in bio-ontologies stems as well from the work I do as a principal investigator with the Mouse Genome Informatics (MGI) project at The Jackson Laboratory. The MGI system is a model organism community database resource that provides integrated information about the genetics, genomics and phenotypes of the laboratory mouse. My current research projects combine bio-ontologies and database knowledge systems to represent disease processes with the objective of discovering molecular elements that contribute to particular pathologies such as respiratory diseases.
The Gene Ontology Consortium
Widespread use of the GO system for functional annotation of genomes enables comparative analysis of genome-size data sets. Understanding and supporting the GO annotation process and bringing new groups into the GO community is essential to the continued development of a broad, integrated network of biological information that can be transparently shared to enable and advance knowledge discovery. The GO Consortium group now consists of 19 model organism databases and genome-annotation groups who work cooperatively to construct the GO bio-ontologies, to provide functional annotations for a wide variety of organisms, and to support a GO information resource. GO participants located at The Jackson Laboratory lead ontology development projects, develop new software applications for the GO project, and provide GO annotations for mouse gene products. Other core groups of the GO project include an ontology development group based at the European Bioinformatics Institute in the United Kingdom, a software and resource development group based at Lawrence Berkeley National Laboratory, and a production database group based at Stanford University.
The Mouse Genome Informatics Project
MGI supports scientific research that uses the laboratory mouse as a model for the study of human biology and disease. MGI data are curated both from the biomedical literature and from co-curated data loads from other major bioinformatics resources. My research group is responsible for the functional and comparative annotation of mouse genes in the MGI resource. This work includes defining the mouse gene set (in co-curation with other informatics resource providers), indexing the biomedical literature for functional annotation, providing official gene nomenclature along with a robust set of synonyms, and extending the representation of relationships between mouse, human and rat genes and genomes. We work closely with the MGI Sequences and Sequence Maps group to resolve sequence-based inconsistencies in the representations of the mouse geneome and transcript data integrated in MGI and between MGI and other informatics resource centers such as the NCBI, Ensembl and the UniProt groups. We also work closely with the MGI Phenotypes group to support the development of standards for the representation of phenotype/genotype data in MGI.
MGI-GO Scientific Curators are using a combination of algorithmic and manual approaches to update annotations of mouse gene products to the GO vocabularies. Currently, more than 17,500 mouse genes have at least preliminary GO annotations and over 9,700 have annotations based on experimental assays in mouse. We use data-mining and other strategies to semi-automate gene annotation to the GO. The highest quality annotations, however, depend on skilled scientific curators who review published literature for information that provides experimental verification for the GO attributions.
- Dowell KG, McAndrews-Hill M, Hill DP, Drabkin HJ, Blake JA. 2009. Integrating text mining into the MGI biocuration workflow. Database 2009 bap019. doi:101.1093/database/bap019.
- Arighi CN, Liu H, Natale DA, Barker WC, Drabkin H, Blake JA, Smith B, Wu CH. 2009. TGF-beta signaling proteins and the Protein Ontology. BMC Bioinformatics 10(Suppl 5):S3.
- The Reference Genome Group of the Gene Ontology Consortium. 2009. The Gene Ontology’s Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol 5(7):e1000431.
- Dolan ME, Blake JA. 2009. Using ontology visualization to facilitate access to knowledge about human disease genes. Applied Ontology 4(1):35-49.
- Sam LT, Mendonca EA, Li J, Blake J, Friedman C, Lussier YA. 2009. PhenoGO: an integrated resource for the multiscale mining of clinical and biological data. BMC Bioinformatics 10(Suppl 2):S8.
- Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE; Mouse Genome Database Group. 2009. The Mouse Genome Database genotypes::phenotypes. Nucleic Acids Res. 37(Database):D712-9.
- Pena-Castillo L, Tasan M, Blake JA, [40 authors]Ö Roth FP. 2008. A critical assessment of M. musculus gene function prediction using integrated genomic evidence. Genome Biology 9(Suppl 1):S2. PMC2447536
- Lovering RC, Camon EB, Blake JA, Diehl AD. 2008. Access to immunology through the Gene Ontology. Immunology 125(2):154-60.
- Hill DP, Smith B, McAndrews-Hill MS, Blake JA. 2008. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics 9:Suppl 5:S2. PMC2367625
- Altman RB, Bergman CM, Blake J, Blaschke C, Cohen A, Gannon F, Grivell L, Hahn U, Hersh W, Hirschman L, Jensen LJ, Krallinger M, Mons B, O’Donoghue SI, Peitsch MC, Rebholz-Schuhmann D, Shatkay H, Valencia A. 2008. Text mining for biology–the way forward: opinions from leading scientists. Genome Biol. 9(suppl 2):S7. PMC2559991
- Tasan M, Tian W, Hill DP, Gibbons FD, Blake JA, Roth FP. 2008. An en masse phenotype and function prediction system for M. musculus. Genome Biology. 9:S8. PMC2447542
- The Gene Ontology Consortium*. 2008. The Gene Ontology (GO) Project in 2008. Nucleic Acids Res. 36(Database):D440-4.
- Blake JA and Harris MA. 2008. The Gene Ontology(GO) Project: Structured Vocabularies for Molecular Biology and Their Application to Genome and Expression Analysis. In: Current Protocols in Bioinformatics (23)7.2.1-7.2.9.
- Diehl AD, Lee JA, Scheuermann RH, Blake JA. 2007. Ontology development for biological systems: Immunology. Bioinformatics 31(epub):.
- Natale DA, Arighi CN, Barker WC, Blake JA, Chang T-C, Hu Z, Liu H, Smith B, Wu CH. 2007. Framework for a Protein Ontology. BMC Bioinformatics 8(Suppl 9):S1.
- Blake JA, Bult CJ. 2006. Beyond the data deluge: data integration and bio-ontologiees. J Biomen Inform 39:314-320.
- Dolan M,Camon E, Blake JA. 2005. A Procedure for Assessing GO Annotation Consistency. Bioinformatics 21(suppl 1):i136-i143.
- Drabkin HJ, Hollenbeck C, Hill DP and Blake JA. 2005. Ontological visualization of protein-protein interactions. BMC Bioinfornatics 6:29.
- Bada M, Stevens R, Goble C, Gil Y,Ashburner M, Blake JA, Harris M, and Lewis S. 2004. A Short Study on the Success of the Gene Ontology. Journal of Web Semantics 1(2):.
- Blake J. 2004. Bio-ontologies-fast and furious. Nat Biotechnol 22(6):773-4.
- Evsikov A, deVries WN, Peaston AE, Radford EE, Fancher KS,Chen FH,Blake JA,Bult CJ,Latham KE,Soltor D,Knowles BB. 2004. Systems biology of the 2-cell mouse embryo. Cytogenet Genome Res 105:240-250.
- Baldarelli RM, Hill DP, Blake JA, Adachi J, Furuno M, Bradt D, Corbani LE, Cousins S, Frazer KS, Qi D, Yang L, Ramachandran S, Reed D, Zhu Y, Kasukawa T, Ringwald M, King BL, Maltais LJ, McKenzie LM, Schriml L, Maglott D, Church D, Pruitt K, Okazaki Y, Hayashizaki Y, Eppig JT, Richardson JE, Kadin JA, Bult CJ. 2003. Connecting Sequence and Biology in the Laboratory Mouse. Genome Res 13(6b):1505-1529.
- Schriml LM, Hill DP, Blake JA, Bono H, Wynshaw-Boris A, Paven W, Ring BZ, Beisel K, Setou M, Okazaki Y, Hayashizaki Y. 2003. Human Disease Genes and their cloned Mouse Orthologs: Exploration of the FANTOM2 dataset. Genome Res 13(6b):1496-1501.
- Hill DP, Blake JA, Richardson JE and Ringwald M. 2002. Extension and Integration of the Gene Ontology (GO): Combining GO vocabularies with external vocabularies. Genome Res 12:1982-91.
- The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I and II team*. 2002. Analysis of the mouse transcriptome based on the functional annotation of 60,770 full-length cDNAs. Nature 420:563-73.
- Blake JA, Eppig JT, Richardson JE, Bult CJ, Kadin JA, and the Mouse Genome Database Group. 2001. The Mouse Genome Database (MGD): Integration Nexus for the laboratory mouse. Nucleic Acids Res 29(1):91-94.
- Hill DP, Davis AP, Richardson JA, Corradi J, Ringwald M, Eppig JT, Blake JA. 2001. Strategies for biological annotation of mammalian systems: Implementing gene ontologies in mouse genome informatics. Genomics 74(1):121-8.
- The Gene Ontology Consortium. 2001. Creating the Gene Ontology Resource: Design and Implementation. Genome Res 11(8):1425-33.
- The Gene Ontology Consortium*. 2000. Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25-29.
- Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara A, Fukunishi Y, Konno H, Adachi J, Fukuda S, Aizawa K, Izawa M, Nishi K, Kiyosawa H, Kondo S, Yamanaka I, Saito T, Okazaki Y, Gojobori T, Bono H, Kasukawa T, Saito R, Kadota K, Matsuda H, Ashburner M, Batalov S, Casavant T, Fleischmann W, Gaasterland T, Gissi C, King B, Kochiwa H, Kuehl P, Lewis S, Matsuo Y, Nikaido I, Pesole G, Quackenbush J, Schriml LM, Staubli F, Suzuki R, Tomita M, Wagner L, Washio T, Sakai K, Okido. 2001. Functional annotation of a full-length mouse cDNA collection. Nature 409(6821):685-90.
- 1995 to 1997 — NSF – Research Collections in Systematics and Ecology Advisory Panel
- 2001 to 2003 — NIH – BISTI pre-National Programs of Excellence in Biomedical Computing (NPEBC) Advisory Panel.
- 2003 to 2009 — External Scientific Advisory Board, Saccharomyces Genome Database (SGD) (NHGRI).
- 2003 to 2009 — External Scientific Advisory Panel for the MDI-BL Toxicogenomics
- 2003 to 2005 — External Scientific Panel for the Pharmacogenetics Research Network
- 2003 to 2006 — External Scientific Advisory Board, TIGR Rice Genome Annotation Project (NSF)
- 2004 to 2009 — TREC Genomics Track – Steering Committee (NIST)
- 2004 to 2004 — NIH: INBRE Review Panel
- 2004 to 2005 — NIH Ad Hoc Study Section Member: Genomics, Computational Biology and Technology (GCAT).
- 2005 to 2005 — Special Editor, PLOS Genetics: Fantom 3 issue.
- 2005 to 2008 — Program Committee – Intelligent Systems for Molecular Biology (ISMB).
- 2005 to 2005 — Program Committee – Pacific Symposium on Biocomputing Meeting
- 2005 to 2005 — Program Committee – European Conference on Computational Biology.
- 2005 to 2005 — Program Committee – Data Integration in the Life Sciences Meeting.
- 2006 to 2009 — NIH Study Section: Genomics, Computational Biology, and Technology (GCAT).
- 2006 to 2006 — NHLBI Cardiovascular Strategic Planning Working Group #11
- 2006 to 2009 — External Scientific Advisory Board, UniProt (Universal Protein Resource)
- 2006 to 2009 — External Scientific Advisory Panel, PharmacoGenetics KnowledgeBase (PharmaGKB)
- 2006 to 2008 — 2006-2008 Program Committee – Bio-Ontologies SIG; ISMB meeting.
- 2008 to 2009 — CIHR (Canada Institutes of Health Research) External Scientific Advisory Board. “Model Organism Interactions and Human Diseases” project.