Go to:
Logótipo
Você está em: Start > Publications > View > Evaluating WMT 2025 Metrics shared task submissions on the SSA-MTE African challenge set
Publication

Evaluating WMT 2025 Metrics shared task submissions on the SSA-MTE African challenge set

Title
Evaluating WMT 2025 Metrics shared task submissions on the SSA-MTE African challenge set
Type
Article in International Conference Proceedings Book
Year
2025
Authors
Li, Senyu
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Ali, Felermino D. M. A.
(Author)
Other
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications Without AUTHENTICUS Without ORCID
Wang, Jiayi
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Stenetorp, Pontus
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Cherry, Colin
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Adelani, David Ifeoluwa
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Conference proceedings International
Pages: 913-919
10th Conference on Machine Translation
Suzhou, China, 2025
Indexing
Crossref
Other information
Resumo (PT):
Abstract (EN): This paper presents the evaluation of submissions to the WMT 2025 Metrics Shared Task on the SSA-MTE challenge set, a large-scale benchmark for machine translation evaluation (MTE) in Sub-Saharan African languages. The SSA-MTE test sets contains over 12,768 human-annotated adequacy scores across 11 language pairs sourced from English, French, and Portuguese, spanning 6 commercial and open-source MT systems. Results show that correlations with human judgments remain generally low, with most systems falling below the 0.4 Spearman threshold for medium-level agreement. Performance varies widely across language pairs, with most correlations under 0.4; in some extremely low-resource cases, such as Portuguese–Emakhuwa, correlations drop to around 0.1, underscoring the difficulty of evaluating MT for very low-resource African languages. These findings highlight the urgent need for more research on robust, generalizable MT evaluation methods tailored for African languages.
Language: English
Type (Professor's evaluation): Scientific
Documents
File name Description Size
2025.wmt-1.65 227.12 KB
Recommend this page Top
Copyright 1996-2025 © Faculdade de Desporto da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-12-03 at 17:50:53 | Privacy Policy | Personal Data Protection Policy | Whistleblowing | Electronic Yellow Book