Abstract (EN):
Discretization of continuous attributes is an important task for certain types of machine learning algorithms. Bayesian approaches, for instance, require assumptions about data distributions. Decision Trees on the other hand, require sorting operations to deal with continuous attributes, which largely increase learning times. This paper presents a new method of discretization, whose main characteristic is that it takes into account interdependencies between attributes. Detecting interdependencies can be seen as discovering redundant attributes. This means that our method performs attribute selection as a side effect of the discretization. Empirical evaluation on five benchmark datasets from UCI repository, using C4.5 and a naive Bayes, shows a consistent reduction of the features without loss of generalization accuracy.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
10