ajuda

Você está em: Início > Publicações > Visualização > ACE-2005-PT: corpus for event extraction in portuguese

Mapa das Instalações

Publicação

Pesquisa de Publicações

ACE-2005-PT: corpus for event extraction in portuguese

Título

ACE-2005-PT: corpus for event extraction in portugueseExportar publicação no formato APA Exportar publicação no formato EXCEL Exportar publicação no formato RIS

Tipo

Artigo em Livro de Atas de Conferência Internacional

Data

2024

Título

ACE-2005-PT: corpus for event extraction in portuguese

Tipo

Artigo em Livro de Atas de Conferência Internacional

Ano

2024

Autores

Cunha, Luís Filipe

(Autor)

Outra

A pessoa não pertence à instituição. A pessoa não pertence à instituição. A pessoa não pertence à instituição. Sem AUTHENTICUS Sem ORCID

Silvano, Maria da Purificação

(Autor)

FLUP

Ver página pessoal Sem permissões para visualizar e-mail institucional Pesquisar Publicações do Participante Sem AUTHENTICUS Sem ORCID

Campos, Ricardo

(Autor)

Outra

A pessoa não pertence à instituição. A pessoa não pertence à instituição. A pessoa não pertence à instituição. Sem AUTHENTICUS Sem ORCID

Jorge, Alípio

(Autor)

FCUP

Ver página pessoal Sem permissões para visualizar e-mail institucional Pesquisar Publicações do Participante Ver página do Authenticus Sem ORCID

Ata de Conferência Internacional

Título: SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval Pesquisar Publicações da Ata de Conferência

Páginas: 661-666

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Washington, 2024

Indexação

Crossref

Outras Informações

DOI: 10.1145/3626772.3657872

Resumo (PT):

Abstract (EN): Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we developed an alignment pipeline that incorporates several alignment techniques: lemmatization, fuzzy matching, synonym matching, multiple translations and a BERT-based word aligner. To measure the alignment effectiveness, a subset of annotations from the ACE-2005-PT corpus was manually aligned by a linguist expert. This subset was then compared against our pipeline results which achieved exact and relaxed match scores of 70.55% and 87.55% respectively. As a result, we successfully generated a Portuguese version of the ACE-2005 corpus, which has been accepted for publication by LDC.

Idioma: Inglês

Tipo (Avaliação Docente): Científica

Documentos

Nome do Ficheiro	Descrição	Tamanho
3626772.3657872		1266.77 KB

Recomendar Página Voltar ao Topo

Copyright 1996-2025 © Faculdade de Medicina Dentária da Universidade do Porto I Termos e Condições I Acessibilidade I Índice A-Z
Página gerada em: 2025-12-09 às 08:40:49 | Política de Privacidade | Política de Proteção de Dados Pessoais | Denúncias | Livro Amarelo Eletrónico