Guest lecture on Wednesday 21 Oct, 10:15 a.m., Exactum B222
Dr. Tijl De Bie
University of Bristol
Intelligent Systems Laboratory
Explicit probabilistic models for databases and networks
Abstract:
Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results.
In this talk, I introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy
(MaxEnt) principle. I will show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. I will argue that this approach can be used for a variety of data types. However, for concreteness, I have chosen to demonstrate it in particular for databases and networks.
I will include a general discussion of an important possible usage of explicit models in data mining. In particular, I will demonstrate by means of an example on binary databases how the model can be used for the definition of a new measure of informativeness for itemsets.
Bio & further information available at:
http://www.tijldebie.net/
Last updated on 8 Oct 2009 by Visa Noronen - Page created on 21 Oct 2009 by Visa Noronen