Go to:
Logótipo
Você está em: Start > Publications > View > Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation
Map of Premises
Principal
Publication

Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation

Title
Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation
Type
Article in International Conference Proceedings Book
Year
2024
Authors
António Ali, FDM
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Silva, RS
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Conference proceedings International
Pages: 579-592
9th Conference on Machine Translation
Miami, 2024
Indexing
Other information
Authenticus ID: P-018-0DY
Abstract (EN): As part of the Open Language Data Initiative shared tasks, we have expanded the FLORES+ evaluation set to include Emakhuwa, a low-resource language widely spoken in Mozambique. We translated the dev and devtest sets from Portuguese into Emakhuwa, and we detail the translation process and quality assurance measures used. Our methodology involved various quality checks, including post-editing and adequacy assessments. The resulting datasets consist of multiple reference sentences for each source. We present baseline results from training a Neural Machine Translation system and fine-tuning existing multilingual translation models. Our findings suggest that spelling inconsistencies remain a challenge in Emakhuwa. Additionally, the baseline models underperformed on this evaluation set, underscoring the necessity for further research to enhance machine translation quality for Emakhuwa. The data is publicly available at https://huggingface.co/datasets/LIACC/Emakhuwa-FLORES ©2024 Association for Computational Linguistics.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 13
Documents
File name Description Size
2024.wmt-1.45[1] 1981.27 KB
Related Publications

Of the same authors

Expanding FLORES+ Benchmark for more Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation (2024)
Article in International Scientific Journal
António Ali, FDM; Henrique Lopes Cardoso; Silva, RS
Network-based Approach for Stopwords Detection (2024)
Article in International Conference Proceedings Book
António Ali, FDM; Jesus, Gd; Henrique Lopes Cardoso; Nunes, SS; Silva, RS
Recommend this page Top
Copyright 1996-2025 © Faculdade de Medicina Dentária da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-07-09 at 01:11:36 | Privacy Policy | Personal Data Protection Policy | Whistleblowing | Electronic Yellow Book