Go to:
Logótipo
Comuta visibilidade da coluna esquerda
Você está em: Start > Publications > View > Exploring the effects of data distribution in missing data imputation
Publication

Publications

Exploring the effects of data distribution in missing data imputation

Title
Exploring the effects of data distribution in missing data imputation
Type
Article in International Conference Proceedings Book
Year
2018
Authors
Pompeu Soares, J
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Seoane Santos, M
(Author)
Other
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page Without ORCID
Araújo, H
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Santos, J
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. View Authenticus page Without ORCID
Conference proceedings International
Pages: 251-263
17th International Symposium on Intelligent Data Analysis, IDA 2018
24 October 2018 through 26 October 2018
Indexing
Other information
Authenticus ID: P-00P-TN0
Abstract (EN): In data imputation problems, researchers typically use several techniques, individually or in combination, in order to find the one that presents the best performance over all the features comprised in the dataset. This strategy, however, neglects the nature of data (data distribution) and makes impractical the generalisation of the findings, since for new datasets, a huge number of new, time consuming experiments need to be performed. To overcome this issue, this work aims to understand the relationship between data distribution and the performance of standard imputation techniques, providing a heuristic on the choice of proper imputation methods and avoiding the needs to test a large set of methods. To this end, several datasets were selected considering different sample sizes, number of features, distributions and contexts and missing values were inserted at different percentages and scenarios. Then, different imputation methods were evaluated in terms of predictive and distributional accuracy. Our findings show that there is a relationship between features¿ distribution and algorithms¿ performance, and that their performance seems to be affected by the combination of missing rate and scenario at state and also other less obvious factors such as sample size, goodness-of-fit of features and the ratio between the number of features and the different distributions comprised in the dataset. © Springer Nature Switzerland AG 2018.
Language: English
Type (Professor's evaluation): Scientific
Documents
We could not find any documents associated to the publication.
Recommend this page Top
Copyright 1996-2025 © Faculdade de Direito da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-08-12 at 23:08:03 | Privacy Policy | Personal Data Protection Policy | Whistleblowing