Go to:
Logótipo
Comuta visibilidade da coluna esquerda
Você está em: Start > Publications > View > Influence of data distribution in missing data imputation
Publication

Publications

Influence of data distribution in missing data imputation

Title
Influence of data distribution in missing data imputation
Type
Article in International Conference Proceedings Book
Year
2017
Authors
Santos M.S.
(Author)
Other
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page Without ORCID
Soares J.P.
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Araújo H.
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Santos J.
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. View Authenticus page Without ORCID
Conference proceedings International
Indexing
Other information
Authenticus ID: P-00Z-SMQ
Abstract (EN): Dealing with missing data is a crucial step in the preprocessing stage of most data mining projects. Especially in healthcare contexts, addressing this issue is fundamental, since it may result in keeping or loosing critical patient information that can help physicians in their daily clinical practice. Over the years, many researchers have addressed this problem, basing their approach on the implementation of a set of imputation techniques and evaluating their performance in classification tasks. These classic approaches, however, do not consider some intrinsic data information that could be related to the performance of those algorithms, such as features¿ distribution. Establishing a correspondence between data distribution and the most proper imputation method avoids the need of repeatedly testing a large set of methods, since it provides a heuristic on the best choice for each feature in the study. The goal of this work is to understand the relationship between data distribution and the performance of well-known imputation techniques, such as Mean, Decision Trees, k-Nearest Neighbours, Self-Organizing Maps and Support Vector Machines imputation. Several publicly available datasets, all complete, were selected attending to several characteristics such as number of distributions, features and instances. Missing values were artificially generated at different percentages and the imputation methods were evaluated in terms of Predictive and Distributional Accuracy. Our findings show that there is a relationship between features¿ distribution and algorithms¿ performance, although some factors must be taken into account, such as the number of features per distribution and the missing rate at state.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 9
Documents
We could not find any documents associated to the publication.
Recommend this page Top
Copyright 1996-2025 © Faculdade de Direito da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-08-10 at 03:34:43 | Privacy Policy | Personal Data Protection Policy | Whistleblowing