Abstract (EN):
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a softprompt model to improve the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed that the inclusion of tailored soft-prompts improved accuracy and efficiency, while ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.
Idioma:
Inglês
Tipo (Avaliação Docente):
Científica
Nº de páginas:
12