help

Você está em: Start > Publications > View > OSPT: European Portuguese Paraphrastic Dataset with Machine Translation

Map of Premises

Publication

Publication Search

OSPT: European Portuguese Paraphrastic Dataset with Machine Translation

Title

OSPT: European Portuguese Paraphrastic Dataset with Machine TranslationExport publication in the APA format Export publication in the EXCEL format Export publication in the RIS format

Type

Article in International Conference Proceedings Book

Date

2023

Title

OSPT: European Portuguese Paraphrastic Dataset with Machine Translation

Type

Article in International Conference Proceedings Book

Year

2023

Authors

Sousa, A

(Author)

Other

The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID

Henrique Lopes Cardoso

(Author)

FEUP

View Personal Page Send message Search for Participant Publications View Authenticus page View ORCID page

Conference proceedings International

Title: PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I Search for Conference Proceedings Publications

Pages: 454-466

22nd EPIA Conference on Artificial Intelligence (EPIA)

Azores, PORTUGAL, SEP 05-08, 2023

Indexing

ISI Web of Knowledge - 0 Citations

Scopus - 0 Citations

Other information

Authenticus ID: P-00Z-KWY

DOI: 10.1007/978-3-031-49008-8_36

Abstract (EN): We describe OSPT, a new linguistic resource for European Portuguese that comprises more than 1.5 million Portuguese-Portuguese sentential paraphrase pairs. We generated the pairs automatically by using neural machine translation to translate the non-Portuguese side of a large parallel corpus. We hope this new corpus can be a valuable resource for paraphrase generation and provide a rich semantic knowledge source to improve downstream natural language understanding tasks. To show the quality and utility of such a dataset, we use it to train paraphrastic sentence embeddings and evaluate them in the ASSIN2 semantic textual similarity (STS) competition. We found that semantic embeddings trained on a small subset of OSPT can produce better semantic embeddings than the ones trained in the finely curated ASSIN2's training data. Additionally, we show OSPT can be used for paraphrase generation with the potential to produce good data augmentation systems that pseudo-translate from Brazilian Portuguese to European Portuguese.

Language: English

Type (Professor's evaluation): Scientific

No. of pages: 13

Documents

File name	Description	Size
978-3-031-49008-8_36		239.43 KB

Related Publications

Of the same authors

SAPG: Semantically-Aware Paraphrase Generation with AMR Graphs (2025)
Article in International Conference Proceedings Book
Sousa, A; Henrique Lopes Cardoso

PTPARL-V: Portuguese Parliamentary Debates for Voting Behaviour Study (2024)
Article in International Conference Proceedings Book
Sousa, A; Henrique Lopes Cardoso

Pseudo-Semantic Graphs for Generating Paraphrases (2024)
Article in International Conference Proceedings Book
Sousa, A; Henrique Lopes Cardoso

Recommend this page Top

Copyright 1996-2025 © Faculdade de Medicina Dentária da Universidade do Porto I Terms and Conditions I Acessibility I Index A-Z
Page created on: 2025-10-22 at 14:17:06 | Privacy Policy | Personal Data Protection Policy | Whistleblowing | Electronic Yellow Book