Go to:
Logótipo
Comuta visibilidade da coluna esquerda
Você está em: Start > Publications > View > Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions
Publication

Publications

Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions

Title
Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions
Type
Article in International Conference Proceedings Book
Year
2023
Authors
Andrade, C
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
Rita Ribeiro
(Author)
FCUP
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page View ORCID page
João Gama
(Author)
FEP
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page View ORCID page
Conference proceedings International
Pages: 365-377
22nd EPIA Conference on Artificial Intelligence (EPIA)
Azores, PORTUGAL, SEP 05-08, 2023
Other information
Authenticus ID: P-00Z-KYD
Abstract (EN): E-commerce has become an essential aspect of modern life, providing consumers worldwide with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering algorithms. This is the case of a dataset extracted from the Brazilian NF-e Project containing electronic invoice product descriptions, including many product clusters. While LDA-based clustering methods have shown to be crucial, they have been mainly evaluated on datasets with few clusters. We propose the Topic Model with Contextual Outlier Handling (TMCOH) method to overcome this limitation. This method combines the Dirichlet Process, specific word representation, and contextual outlier detection techniques to recycle identified outliers aiming to integrate them into appropriate clusters later on. The experimental results for our case study demonstrate the effectiveness of TMCOH when compared to state-of-the-art methods and its potential for application to text clustering in large datasets.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 13
Documents
We could not find any documents associated to the publication.
Related Publications

Of the same authors

Community-Based Topic Modeling with Contextual Outlier Handling (2024)
Article in International Conference Proceedings Book
Andrade, C; Rita Ribeiro; João Gama
Recommend this page Top
Copyright 1996-2025 © Faculdade de Direito da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-08-15 at 09:47:36 | Privacy Policy | Personal Data Protection Policy | Whistleblowing