Saltar para:
Logótipo
Você está em: Início > Publicações > Visualização > Detecting loanwords in Emakhuwa: an extremely low-resource bantu language exhibiting significant borrowing from portuguese

Detecting loanwords in Emakhuwa: an extremely low-resource bantu language exhibiting significant borrowing from portuguese

Título
Detecting loanwords in Emakhuwa: an extremely low-resource bantu language exhibiting significant borrowing from portuguese
Tipo
Artigo em Livro de Atas de Conferência Internacional
Ano
2024
Autores
Ata de Conferência Internacional
Páginas: 14750-4759
Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Torino, 2024
Indexação
Publicação em Scopus Scopus - 0 Citações
Outras Informações
ID Authenticus: P-010-KZT
Resumo (PT):
Abstract (EN): The accurate identification of loanwords within a given text holds significant potential as a valuable tool for addressing data augmentation and mitigating data sparsity issues. Such identification can improve the performance of various natural language processing tasks, particularly in the context of low-resource languages that lack standardized spelling conventions.This research proposes a supervised method to identify loanwords in Emakhuwa, borrowed from Portuguese. Our methodology encompasses a two-fold approach. Firstly, we employ traditional machine learning algorithms incorporating handcrafted features, including language-specific and similarity-based features. We build upon prior studies to extract similarity features and propose utilizing two external resources: a Sequence-to-Sequence model and a dictionary. This innovative approach allows us to identify loanwords solely by analyzing the target word without prior knowledge about its donor counterpart. Furthermore, we fine-tune the pre-trained CANINE model for the downstream task of loanword detection, which culminates in the impressive achievement of the F1-score of 93%. To the best of our knowledge, this study is the first of its kind focusing on Emakhuwa, and the preliminary results are promising as they pave the way to further advancements.
Idioma: Inglês
Tipo (Avaliação Docente): Científica
Contacto: Disponível em: https://aclanthology.org/2024.lrec-main.425/
Documentos
Nome do Ficheiro Descrição Tamanho
2024.lrec-main.425 396.21 KB
Publicações Relacionadas

Dos mesmos autores

Expanding FLORES+ benchmark for more low-resource settings: Portuguese-Emakhuwa machine translation evaluation (2024)
Artigo em Livro de Atas de Conferência Internacional
Ali, Felermino; Cardoso, Henrique Lopes ; Sousa-Silva, Rui
Building resources for Emakhuwa: machine translation and news classification benchmarks (2024)
Artigo em Livro de Atas de Conferência Internacional
Ali, Felermino; Cardoso, Henrique Lopes ; Sousa-Silva, Rui
Recomendar Página Voltar ao Topo
Copyright 1996-2025 © Faculdade de Medicina Dentária da Universidade do Porto  I Termos e Condições  I Acessibilidade  I Índice A-Z
Página gerada em: 2025-07-30 às 08:35:27 | Política de Privacidade | Política de Proteção de Dados Pessoais | Denúncias | Livro Amarelo Eletrónico