Go to:
Logótipo
Comuta visibilidade da coluna esquerda
Logótipo
Você está em: Start > Publications > View > Evaluation of Lyrics Extraction from Folk Music Sheets Using Vision Language Models (VLMs)
Publication

Evaluation of Lyrics Extraction from Folk Music Sheets Using Vision Language Models (VLMs)

Title
Evaluation of Lyrics Extraction from Folk Music Sheets Using Vision Language Models (VLMs)
Type
Article in International Conference Proceedings Book
Year
2025
Authors
Mendes, AS
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Murciego, AL
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Silva, LA
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Jiménez-Bravo, DM
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Navarro-Cáceres, M
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Conference proceedings International
Pages: 91-102
23rd EPIA Conference on Artificial Intelligence-EPIA
Viana do Castelo, PORTUGAL, SEP 03-06, 2024
Indexing
Publicação em ISI Web of Knowledge ISI Web of Knowledge - 0 Citations
Publicação em Scopus Scopus - 0 Citations
Other information
Authenticus ID: P-017-C9T
Abstract (EN): Monodic folk music has traditionally been preserved in physical documents. It constitutes a vast archive that needs to be digitized to facilitate comprehensive analysis using AI techniques. A critical component of music score digitization is the transcription of lyrics, an extensively researched process in Optical Character Recognition (OCR) and document layout analysis. These fields typically require the development of specific models that operate in several stages: first, to detect the bounding boxes of specific texts, then to identify the language, and finally, to recognize the characters. Recent advances in vision language models (VLMs) have introduced multimodal capabilities, such as processing images and text, which are competitive with traditional OCR methods. This paper proposes an end-to-end system for extracting lyrics from images of handwritten musical scores. We aim to evaluate the performance of two state-of-the-art VLMs to determine whether they can eliminate the need to develop specialized text recognition and OCR models for this task. The results of the study, obtained from a dataset in a real-world application environment, are presented along with promising new research directions in the field. This progress contributes to preserving cultural heritage and opens up new possibilities for global analysis and research in folk music.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 12
Documents
We could not find any documents associated to the publication.
Recommend this page Top
Copyright 1996-2025 © Faculdade de Psicologia e de Ciências da Educação da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-12-02 at 23:18:27 | Privacy Policy | Personal Data Protection Policy | Whistleblowing | Electronic Yellow Book