Resumo (PT):
Abstract (EN):
Electronic Health Records (EHRs) contain vast amounts of unstructured narrative text, posing challenges for
organization, curation, and automated information extraction in clinical and research settings. Developing
e"ective annotation schemes is crucial for training extraction models, yet it remains complex for both human
experts and Large Language Models (LLMs). This study compares human- and LLM-generated annotation
schemes and guidelines through an experimental framework. In the !rst phase, both a human expert and an
LLM created annotation schemes based on prede!ned criteria. In the second phase, experienced annotators
applied these schemes following the guidelines. In both cases, the results were qualitatively evaluated using
Likert scales. The !ndings indicate that the human-generated scheme is more comprehensive, coherent, and clear
compared to those produced by the LLM. These results align with previous research suggesting that while LLMs
show promising performance with respect to text annotation, the same does not apply to the development of
annotation schemes, and human validation remains essential to ensure accuracy and reliability.
Language:
English
Type (Professor's evaluation):
Scientific
Contact:
Disponívele em: https://ceur-ws.org/Vol-3964/paper13.pdf
Notes:
Também participaram neste artigo os seguintes autores: Tahsir Ahmed Munna1, Filipe Cunha1, António Leal, Ricardo Campos e Alípio Jorge.
No. of pages:
11