Go to:
Logótipo
Você está em: Start > Publications > View > OSPT: European Portuguese Paraphrastic Dataset with Machine Translation
Map of Premises
Principal
Publication

OSPT: European Portuguese Paraphrastic Dataset with Machine Translation

Title
OSPT: European Portuguese Paraphrastic Dataset with Machine Translation
Type
Article in International Conference Proceedings Book
Year
2023
Authors
Sousa, A
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Conference proceedings International
Pages: 454-466
22nd EPIA Conference on Artificial Intelligence (EPIA)
Azores, PORTUGAL, SEP 05-08, 2023
Indexing
Publicação em ISI Web of Knowledge ISI Web of Knowledge - 0 Citations
Publicação em Scopus Scopus - 0 Citations
Other information
Authenticus ID: P-00Z-KWY
Abstract (EN): We describe OSPT, a new linguistic resource for European Portuguese that comprises more than 1.5 million Portuguese-Portuguese sentential paraphrase pairs. We generated the pairs automatically by using neural machine translation to translate the non-Portuguese side of a large parallel corpus. We hope this new corpus can be a valuable resource for paraphrase generation and provide a rich semantic knowledge source to improve downstream natural language understanding tasks. To show the quality and utility of such a dataset, we use it to train paraphrastic sentence embeddings and evaluate them in the ASSIN2 semantic textual similarity (STS) competition. We found that semantic embeddings trained on a small subset of OSPT can produce better semantic embeddings than the ones trained in the finely curated ASSIN2's training data. Additionally, we show OSPT can be used for paraphrase generation with the potential to produce good data augmentation systems that pseudo-translate from Brazilian Portuguese to European Portuguese.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 13
Documents
File name Description Size
978-3-031-49008-8_36 239.43 KB
Related Publications

Of the same authors

PTPARL-V: Portuguese Parliamentary Debates for Voting Behaviour Study (2024)
Article in International Conference Proceedings Book
Sousa, A; Henrique Lopes Cardoso
Pseudo-Semantic Graphs for Generating Paraphrases (2024)
Article in International Conference Proceedings Book
Sousa, A; Henrique Lopes Cardoso
Recommend this page Top
Copyright 1996-2025 © Faculdade de Medicina Dentária da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-08-24 at 20:29:23 | Privacy Policy | Personal Data Protection Policy | Whistleblowing | Electronic Yellow Book