Sampling from scarcely defined distributions: Methods and applications in data mining

Lecturer : 
Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Aleksi Kallio
Opponent: 
Dr. Pauli Miettinen, Max-Planck-Institut für Informatik, Germany
Custos: 
Professor Aristides Gionis
Event time: 
2016-02-19 12:00 to 14:00
Place: 
T2 lecture hall, Konemiehentie 2, 02150, Espoo, FI
Description: 

Aleksi Kallio, M.Sc., will defend the dissertation "Sampling from scarcely defined distributions: Methods and applications in data mining" on 19.2. at 12 noon in Aalto University School of Science, lecture hall T2, Konemiehentie 2, Espoo.

Reliability and reproducibility of discoveries is essential for scientific progress. In his dissertation, Aleksi Kallio, M.Sc., studied difficult cases of scientific data analytics and developed new methods and approaches to assess the statistical significance of discoveries. Improved methods are needed due to rapidly growing volumes of data and more complex analytical questions that are faced in modern research.
 
The dissertation introduces the term scarcely defined distributions to describe difficult statistical distributions that are common in modern data analytics. The dissertation discusses methods and applications of data mining, in which scarcely defined distributions emerge. Several strategies are put forth that allow to analyze complex datasets. Applications are reviewed from several fields, including bioinformatics, paleontology and ecology. A common factor for the application areas is the complexity of the underlying processes and error sources.
 
The work concludes that development of new and flexible analytical methods is crucial for all fields that desire to use data to support decision making and prediction. If testing for significance and reliability is not on par with the rest of the data processing machinery then the future of data driven discovery will be plagued with false interpretations. The applicability of the research extends beyond the fields that were discussed. The generic methods and approaches can be adopted to many use cases where complex data sources are relevant, including major social questions related to medicine, climate and social networks.
 
 
Opponent: Dr. Pauli Miettinen, Max-Planck-Institut für Informatik, Germany 
 
Custos: Professor Aristides Gionis, Aalto University School of Science, Department of Computer Science
 
 
School of Science, electronic dissertations: https://aaltodoc.aalto.fi/handle/123456789/52 
 
 
 

Last updated on 9 Feb 2016 by Maria Lindqvist - Page created on 9 Feb 2016 by Maria Lindqvist