Archetypal Analysis - Mining of the Extreme

Lecturer : 
Manuel J. A. Eugster
Event type: 
HIIT seminar
Doctoral dissertation
Respondent: 
Opponent: 
Custos: 
Event time: 
2012-05-11 10:15 to 11:15
Place: 
Kumpula Exactum C222
Description: 

 

In many data analysis problems extremal points are of fundamental
interest. However, computing extremal points in high-dimensional data
sets is not easy---archetypal analysis provides one way.
 
Archetypal analysis has the aim to represent observations in a 
multivariate data set as convex combinations of extremal points. The
archetypes themselves are restricted to being convex combinations of
the individuals in the data set and lie on the data set boundary,
i.e., the boundary of the convex hull. This allows for dimension
reduction and clustering; furthermore, the archetypes often can be
easily interpreted by the data analysts. It has found application in
different areas, e.g., in performance analysis (economics), gene
analysis (biology), and collaborative filtering (pattern recognition).
 
In this talk, I promote archetypal analysis as a modern data analysis
method where it is promising to combine expertise from statistics,
machine learning, and computer science. I introduce its theoretical
foundations, and present the original algorithm. Its abstraction to
individual conceptual components results in a very general framework
and allows, e.g., the easy adaptation towards a robust M-estimator and
the generalization with other prototype methods like k-means
clustering. Its implementation in the 'archetypes' R package provides
flexible model fitting and model diagnostics as well as model
deployment to end users as compiled C code. As outlook I show first
details of my current research---the extension to multilevel and
evolutionary archetypal analysis. The talk is accompanied by an
illustrative sports example.

Last updated on 6 May 2012 by Dorota Glowacka - Page created on 6 May 2012 by Dorota Glowacka