Abstract (EN):
Data is a major asset in today's healthcare scenery. Hospitals are one of the primary producers of healthcare-related data and the value this data can provide is enormous. However, to use this to improve healthcare practice and push science forward, it is necessary to safeguard the patient's privacy and the ethical use of the data. The ethical and legal requirements are vast and complex. Synthetic data appears as a tool to overcome these hurdles and provide fast and reliable access to data without compromising utility nor privacy. Even though Generative Adversarial Networks (GANs) are receiving a lot of attention lately, the application of most common models and architectures are not suited to tabular data - the most prevalent healthcare-related data. This study surveys the current GAN implementations tailored to this scenario. The analysis was focused mainly on the models employed, datasets used, and metrics reported regarding the quality of the generated data in terms of utility, privacy and how they compare among themselves. We aim to help institutions and investigators get a grasp of the tools to facilitate access to healthcare data, as well as recommendations for testing data synthesizers with privacy concerns.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
10