Algorithmic and Probabilistic Methods in Data Mining

The project develops methods for the exploratory data analysis of large and highdimensional data sets. One of the themes has been finding frequent patterns in large collections of data. The pattern classes include ordered and unordered patterns. Currently areas of interest include condensed representations and the combination of combinatorial and probabilistic techniques for approximating distributions. For sequential data, interests are in algorithms for sequence segmentation under various restrictions and in discovery of order from unordered data sets. Also issues in subspace clustering and spectral methods have been studied.

In 2005 there were several interesting developments. The methods on seriation problems in paleontological and other applications advanced very considerably, and the publications were accepted to important forums. The novel problem setting of mining chains of relations has great promises, as well as the work on condensed representations and on spatial clustering. Special emphasis was given to work on finding partial orders from data.

People

  • Heikki Mannila, project leader
  • Hannu Toivonen
  • Jaakko Hollmén
  • Aristides Gionis
  • Evimaria Terzi
  • Antti Leino
  • Taneli Mielikäinen
  • Jouni Seppänen
  • Nikolaj Tatti
  • Ella Bingham
  • Robert Gwadera
  • Kai Puolamäki
  • Hannes Heikinheimo
  • Antti Ukkonen

Research groups

  • Data Mining, Prof. Heikki Mannila, Prof. Hannu Toivonen

See www.cs.helsinki.fi/research/fdk/datamining for further information and publications.


Last updated on 10 Dec 2007 by Teemu Mäntylä - Page created on 13 Jan 2007 by Webmaster