Biomine: Knowledge discovery in biological databases

Public biological databases contain huge amounts of rich data, such as annotated sequences, proteins, domains, and orthology groups, genes and gene expressions, gene and protein interactions, scientific articles, and ontologies. The Biomine project develops methods for the analysis of such collections of data.

In the Biomine approach, all information is handled as graphs: nodes correspond to different concepts (such as gene, protein, domain, phenotype, biological process, tissue), and semantically labelled edges connect related concepts (e.g., gene BCHE codes protein CHLE, which in turn has the molecular function 'beta-amyloid binding'). One central goal is to develop methods for establishing new, previously unknown connections between nodes, in other words, creation of biological hypotheses. We develop and use data mining algorithms for this. Predicted connections could be based, for instance, on discovered analogies between two concepts or their contexts, or on finding (strong) paths between concepts.

First results include methods for extracting relevant subgraphs and models for measuring the strength of connections between given concepts. We have also integrated public biological databases to an in-house graph database of over 10 million objects. These tools are currently used in studies of, e.g., dyslexia and Huntington disease.

Discovery of patterns in graphs have numerous potential applications in biology, including the analysis of metabolic networks, regulatory relationships, protein structures, and chemical compounds, as obvious candidates. Virtually any data could be described as graphs, and the developed methods can potentially be applied in other areas, too.

People

  • Hannu Toivonen, project leader
  • Lauri Eronen
  • Petteri Hintsanen
  • Kimmo Kulovesi
  • Juho Muhonen
  • Petteri Sevon

Research groups

See www.cs.helsinki.fi/group/biomine for further information and publications.


Last updated on 10 Dec 2007 by Teemu Mäntylä - Page created on 13 Jan 2007 by Webmaster