Você está em: Start > Publications > View > Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions

Map of Premises

Publication

Publication Search

Publications

Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions

Title

Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product DescriptionsExport publication in the APA format Export publication in the EXCEL format Export publication in the RIS format

Type

Article in International Conference Proceedings Book

Date

2023

Title

Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions

Type

Article in International Conference Proceedings Book

Year

2023

Authors

Andrade, C

(Author)

Other

The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID

Rita Ribeiro

(Author)

FCUP

View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page View ORCID page

João Gama

(Author)

FEP

View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page View ORCID page

Conference proceedings International

Title: PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I Search for Conference Proceedings Publications

Pages: 365-377

22nd EPIA Conference on Artificial Intelligence (EPIA)

Azores, PORTUGAL, SEP 05-08, 2023

Indexing

ISI Web of Knowledge - 2 Citations

Scopus - 2 Citations

Other information

Authenticus ID: P-00Z-KYD

DOI: 10.1007/978-3-031-49008-8_29

Abstract (EN): E-commerce has become an essential aspect of modern life, providing consumers worldwide with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering algorithms. This is the case of a dataset extracted from the Brazilian NF-e Project containing electronic invoice product descriptions, including many product clusters. While LDA-based clustering methods have shown to be crucial, they have been mainly evaluated on datasets with few clusters. We propose the Topic Model with Contextual Outlier Handling (TMCOH) method to overcome this limitation. This method combines the Dirichlet Process, specific word representation, and contextual outlier detection techniques to recycle identified outliers aiming to integrate them into appropriate clusters later on. The experimental results for our case study demonstrate the effectiveness of TMCOH when compared to state-of-the-art methods and its potential for application to text clustering in large datasets.

Language: English

Type (Professor's evaluation): Scientific

No. of pages: 13

Documents

We could not find any documents associated to the publication.

Related Publications

Of the same authors

Community-Based Topic Modeling with Contextual Outlier Handling (2024)
Article in International Conference Proceedings Book
Andrade, C; Rita Ribeiro; João Gama

Recommend this page Top

Copyright 1996-2025 © Faculdade de Direito da Universidade do Porto I Terms and Conditions I Acessibility I Index A-Z
Page created on: 2025-08-15 at 09:47:36 | Privacy Policy | Personal Data Protection Policy | Whistleblowing