Abstract (EN):
In [14] we proposed a method to detect outliers in multivariate data based
on clustering and robust estimators. To implement this method in practice it is necessary
to choose a clustering method, a pair of location and scatter estimators, and
the number of clusters, k. After several simulation experiments it was possible to
give a number of guidelines regarding the first two choices. However the choice of
the number of clusters depends entirely on the structure of the particular data set
under study. Our suggestion is to try several values of k (e.g. from 1 to a maximum
reasonable k which depends on the number of observations and on the number of
variables) and select k minimizing an adapted AIC. In this paper we analyze this
AIC based criterion for choosing the number of clusters k (and also the clustering
method and the location and scatter estimators) by applying it to several simulated
data sets with and without outliers.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
8
License type: