Abstract:
The power of human computation is founded on the capabilities of
humans to process qualitative information in a manner that is hard to
reproduce with a computer. However, all machine learning algorithms
rely on mathematical operations, such as sums, averages, least squares
etc. that are less suitable for human computation.
This paper is an effort to combine these two aspects of data
processing. We consider the problem of computing a centroid of a data
set, a key component in many data-analysis applications such as
clustering, using a very simple human intelligence task (HIT). In
this task the workers must choose the outlier from a set of three
items. After presenting a number of such triplets to the workers, the
item chosen the least number of times as the outlier is selected as
the centroid.
We provide a proof that the centroid determined by this procedure is
equal the mean of a univariate normal distribution. Furthermore, as a
demonstration of the viability of our method, we implement a human
computation based variant of the k-means clustering algorithm. We
present experiments where the proposed method is used to find an
``average'' image in a collection, and cluster images to semantic
categories.
About the speaker:
Antti Ukkonen obtained his doctoral degree at Aalto University in 2008. From 2009 until 2012 he was a postdoctoral researcher at Yahoo! Research Barcelona. Currently he is a postdoc at Helsinki Institute for Information Technology HIIT (Aalto University). His research interests include various algorithmic and computational aspects of data analysis and machine learning.
Last updated on 11 Mar 2014 by Antti Ukkonen - Page created on 11 Mar 2014 by Antti Ukkonen