Go to:
Logótipo
Comuta visibilidade da coluna esquerda
Você está em: Start > Publications > View > On the Quality of Synthetic Generated Tabular Data
Publication

Publications

On the Quality of Synthetic Generated Tabular Data

Title
On the Quality of Synthetic Generated Tabular Data
Type
Article in International Scientific Journal
Year
2023
Authors
Espinosa, E
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Figueira, A
(Author)
FCUP
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page View ORCID page
Journal
Title: MathematicsImported from Authenticus Search for Journal Publications
Vol. 11
Final page: 3278
Publisher: MDPI
Other information
Authenticus ID: P-00Y-S1F
Abstract (EN): Class imbalance is a common issue while developing classification models. In order to tackle this problem, synthetic data have recently been developed to enhance the minority class. These artificially generated samples aim to bolster the representation of the minority class. However, evaluating the suitability of such generated data is crucial to ensure their alignment with the original data distribution. Utility measures come into play here to quantify how similar the distribution of the generated data is to the original one. For tabular data, there are various evaluation methods that assess different characteristics of the generated data. In this study, we collected utility measures and categorized them based on the type of analysis they performed. We then applied these measures to synthetic data generated from two well-known datasets, Adults Income, and Liar+. We also used five well-known generative models, Borderline SMOTE, DataSynthesizer, CTGAN, CopulaGAN, and REaLTabFormer, to generate the synthetic data and evaluated its quality using the utility measures. The measurements have proven to be informative, indicating that if one synthetic dataset is superior to another in terms of utility measures, it will be more effective as an augmentation for the minority class when performing classification tasks.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 18
Documents
We could not find any documents associated to the publication.
Related Publications

Of the same journal

Survey on Synthetic Data Generation, Evaluation Methods and GANs (2022)
Another Publication in an International Scientific Journal
Figueira, A; Vaz, B
Nonlinear Dynamics (2022)
Another Publication in an International Scientific Journal
António Mendes Lopes; Machado, JAT
Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods (2020)
Another Publication in an International Scientific Journal
Nosratabadi, S; Mosavi, A; Duan, P; Ghamisi, P; Filip, F; Band, SS; Reuter, U; João Gama; Gandomi, AH
Welfare-Balanced International Trade Agreements (2023)
Article in International Scientific Journal
Martins, F; Alberto A. Pinto; Zubelli, JP
Validation of HiG-Flow Software for Simulating Two-Phase Flows with a 3D Geometric Volume of Fluid Algorithm (2023)
Article in International Scientific Journal
Silva, ATGD; Fernandes, C; Organista, J; Souza, L; Castelo, A

See all (46)

Recommend this page Top
Copyright 1996-2025 © Faculdade de Direito da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-07-19 at 06:41:12 | Privacy Policy | Personal Data Protection Policy | Whistleblowing