Abstract (EN):
Despite their importance, hierarchical clustering has been little explored for semi-supervised algorithms. In this paper, we address the problem of semi-supervised hierarchical clustering by using an active learning solution with cluster-level constraints. This active learning approach is based on a new concept of merge confidence in agglomerative clustering. When there is low confidence in a cluster merge the user is queried and provides a cluster-level constraint. The proposed method is compared with an unsupervised algorithm (average-link) and two state-of-the-art semi-supervised algorithms (pairwise constraints and Constrained Complete-Link). Results show that our algorithm tends to be better than the two semi-supervised algorithms and can achieve a significant improvement when compared to the unsupervised algorithm. Our approach is particularly useful when the number of clusters is high which is the case in many real problems. © 2012 Springer-Verlag Berlin Heidelberg.
Language:
English
Type (Professor's evaluation):
Scientific