ajuda

Você está em: Início > Publicações > Visualização > Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model

Mapa das Instalações

Publicação

Pesquisa de Publicações

Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model

Título

Summarization of changes in dynamic text collections using Latent Dirichlet Allocation modelExportar publicação no formato APA Exportar publicação no formato EXCEL Exportar publicação no formato RIS

Tipo

Artigo em Revista Científica Internacional

Data

2015

Título

Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model

Tipo

Artigo em Revista Científica Internacional

Ano

2015

Autores

Manika Kar

(Autor)

FEUP

Ver página pessoal Sem permissões para visualizar e-mail institucional Pesquisar Publicações do Participante Sem AUTHENTICUS Sem ORCID

Sérgio Nunes

(Autor)

FEUP

Ver página pessoal Sem permissões para visualizar e-mail institucional Pesquisar Publicações do Participante Ver página do Authenticus Ver página ORCID

Cristina Ribeiro

(Autor)

FEUP

Ver página pessoal Sem permissões para visualizar e-mail institucional Pesquisar Publicações do Participante Ver página do Authenticus Ver página ORCID

Revista

Título: Information Processing and ManagementImportada do Authenticus Pesquisar Publicações da Revista

Vol. 51 Nº 6

Páginas: 809-833

ISSN: 0306-4573

Editora: Elsevier

Indexação

ISI Web of Knowledge - 16 Citações

ISI Web of Science

Scopus - 21 Citações

INSPEC

Classificação Científica

FOS: Ciências exactas e naturais > Ciências da computação e da informação

CORDIS: Ciências Físicas > Ciência de computadores > Informática

Outras Informações

ID Authenticus: P-00G-P2E

DOI: 10.1016/j.ipm.2015.06.002

Abstract (EN): In the area of Information Retrieval, the task of automatic text summarization usually assumes a static underlying collection of documents, disregarding the temporal dimension of each document. However, in real world settings, collections and individual documents rarely stay unchanged over time. The World Wide Web is a prime example of a collection where information changes both frequently and significantly over time, with documents being added, modified or just deleted at different times. In this context, previous work addressing the summarization of web documents has simply discarded the dynamic nature of the web, considering only the latest published version of each individual document. This paper proposes and addresses a new challenge - the automatic summarization of changes in dynamic text collections. In standard text summarization, retrieval techniques present a summary to the user by capturing the major points expressed in the most recent version of an entire document in a condensed form. In this new task, the goal is to obtain a summary that describes the most significant changes made to a document during a given period. In other words, the idea is to have a summary of the revisions made to a document over a specific period of time. This paper proposes different approaches to generate summaries using extractive summarization techniques. First, individual terms are scored and then this information is used to rank and select sentences to produce the final summary. A system based on Latent Dirichlet Allocation model (LDA) is used to find the hidden topic structures of changes. The purpose of using the LDA model is to identify separate topics where the changed terms from each topic are likely to carry at least one significant change. The different approaches are then compared with the previous work in this area. A collection of articles from Wikipedia, including their revision history, is used to evaluate the proposed system. For each article, a temporal interval and a reference summary from the article's content are selected manually. The articles and intervals in which a significant event occurred are carefully selected. The summaries produced by each of the approaches are evaluated comparatively to the manual summaries using ROUGE metrics. It is observed that the approach using the LDA model outperforms all the other approaches. Statistical tests reveal that the differences in ROUGE scores for the LDA-based approach is statistically significant at 99% over baseline.

Idioma: Inglês

Tipo (Avaliação Docente): Científica

Nº de páginas: 25

Documentos

Publicações Relacionadas

Das mesmas áreas científicas

SIGA-Sistema Integrado de Gestão Autárquica, (1987)
Relatório Técnico
Gabriel David; Vladimiro Miranda; Maria Cristina Ribeiro

Moodle at FEUP (2005)
Relatório Técnico
Jaime Enrique Villate Matiz

Multimedia - A Multidisciplinary Approach to Complex Issues (2012)
Livro
Ioannis Karydis

Studying the Impact of the Organizational Structure on Airline Operations Control (2015)
Capítulo ou Parte de Livro
Nuno Machado; António Castro; Eugénio Oliveira

Normative and trust-based systems as enabler technologies for automated negotiation (2014)
Capítulo ou Parte de Livro
Maria Joana Urbano; Henrique Lopes Cardoso; Eugénio Oliveira; Ana Paula Rocha

Ver todas (65)

Da mesma revista

Information Processing & Management Journal Special Issue on Narrative Extraction from Texts (Text2Story) Preface (2019)
Outra Publicação em Revista Científica Internacional
Jorge, AM; Campos, R; Jatowt, A; Sérgio Nunes

On the negative impact of social influence in recommender systems: A study of bribery in collaborative hybrid algorithms (2020)
Artigo em Revista Científica Internacional
Ramos, G; Boratto, L; Caleiro, C

GTE-Rank: A time-aware search engine to answer time-sensitive queries (2016)
Artigo em Revista Científica Internacional
Campos, R; Dias, G; Jorge, AM; Nunes, C

Dimensions as Virtual Items: Improving the predictive ability of top-N recommender systems (2013)
Artigo em Revista Científica Internacional
Marcos Aurelio Domingues; Alipio Mario Jorge; Carlos Soares

Recomendar Página Voltar ao Topo

Copyright 1996-2024 © Reitoria da Universidade do Porto I Termos e Condições I Acessibilidade I Índice A-Z I Livro de Visitas
Página gerada em: 2024-11-09 às 07:03:07 | Política de Utilização Aceitável | Política de Proteção de Dados Pessoais | Denúncias