Go to:
Logótipo
Você está em: Start > Publications > View > Detecting loanwords in Emakhuwa: an extremely low-resource bantu language exhibiting significant borrowing from portuguese
Map of Premises
Principal
Publication

Detecting loanwords in Emakhuwa: an extremely low-resource bantu language exhibiting significant borrowing from portuguese

Title
Detecting loanwords in Emakhuwa: an extremely low-resource bantu language exhibiting significant borrowing from portuguese
Type
Article in International Conference Proceedings Book
Year
2024
Authors
Ali, Felermino
(Author)
Other
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications Without AUTHENTICUS Without ORCID
Conference proceedings International
Pages: 14750-4759
Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Torino, 2024
Indexing
Publicação em Scopus Scopus - 0 Citations
Other information
Authenticus ID: P-010-KZT
Resumo (PT):
Abstract (EN): The accurate identification of loanwords within a given text holds significant potential as a valuable tool for addressing data augmentation and mitigating data sparsity issues. Such identification can improve the performance of various natural language processing tasks, particularly in the context of low-resource languages that lack standardized spelling conventions.This research proposes a supervised method to identify loanwords in Emakhuwa, borrowed from Portuguese. Our methodology encompasses a two-fold approach. Firstly, we employ traditional machine learning algorithms incorporating handcrafted features, including language-specific and similarity-based features. We build upon prior studies to extract similarity features and propose utilizing two external resources: a Sequence-to-Sequence model and a dictionary. This innovative approach allows us to identify loanwords solely by analyzing the target word without prior knowledge about its donor counterpart. Furthermore, we fine-tune the pre-trained CANINE model for the downstream task of loanword detection, which culminates in the impressive achievement of the F1-score of 93%. To the best of our knowledge, this study is the first of its kind focusing on Emakhuwa, and the preliminary results are promising as they pave the way to further advancements.
Language: English
Type (Professor's evaluation): Scientific
Contact: Disponível em: https://aclanthology.org/2024.lrec-main.425/
Documents
File name Description Size
2024.lrec-main.425 396.21 KB
Related Publications

Of the same authors

Expanding FLORES+ benchmark for more low-resource settings: Portuguese-Emakhuwa machine translation evaluation (2024)
Article in International Conference Proceedings Book
Ali, Felermino; Cardoso, Henrique Lopes ; Sousa-Silva, Rui
Building resources for Emakhuwa: machine translation and news classification benchmarks (2024)
Article in International Conference Proceedings Book
Ali, Felermino; Cardoso, Henrique Lopes ; Sousa-Silva, Rui
Recommend this page Top
Copyright 1996-2025 © Faculdade de Medicina Dentária da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-07-13 at 07:11:51 | Privacy Policy | Personal Data Protection Policy | Whistleblowing | Electronic Yellow Book