Go to:
Logótipo
Comuta visibilidade da coluna esquerda
Você está em: Start > Publications > View > Determining language variant in microblog messages
Publication

Publications

Determining language variant in microblog messages

Title
Determining language variant in microblog messages
Type
Article in International Conference Proceedings Book
Year
2013
Authors
laboreiro, g
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
bosnjak, m
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
sarmento, l
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
rodrigues, em
(Author)
Other
The person does not belong to the institution. The person does not belong to the institution. The person does not belong to the institution. Without AUTHENTICUS Without ORCID
oliveira, e
(Author)
FEUP
View Personal Page You do not have permissions to view the institutional email. Search for Participant Publications View Authenticus page View ORCID page
Conference proceedings International
Pages: 902-907
28th Annual ACM Symposium on Applied Computing, SAC 2013
Coimbra, 18 March 2013 through 22 March 2013
Indexing
Publicação em ISI Web of Knowledge ISI Web of Knowledge
Scientific classification
CORDIS: Physical sciences > Computer science > Informatics
Other information
Authenticus ID: P-008-B09
Resumo (PT): It is dicult to determine the country of origin of the author of a short message based only on the text. This is an even more complex problem when more than one country uses the same native language. In this paper, we address the speci c problem of detecting the two main variants of the Portuguese language - European and Brazilian - in Twitter micro-blogging data, by proposing and evaluating a set of high-precision features. We follow an automatic classi cation approach using a Nave Bayes classi er, achieving 95% accuracy. We nd that our system is adequate for real-time tweet classi cation.
Abstract (EN): It is difficult to determine the country of origin of the author of a short message based only on the text. This is an even more complex problem when more than one country uses the same native language. In this paper, we address the specific problem of detecting the two main variants of the Portuguese language - European and Brazilian - in Twitter micro-blogging data, by proposing and evaluating a set of high-precision features. We follow an automatic classification approach using a Naïve Bayes classifier, achieving 95% accuracy. We find that our system is adequate for real-time tweet classification. Copyright 2013 ACM.
Language: English
Type (Professor's evaluation): Scientific
No. of pages: 6
Documents
We could not find any documents associated to the publication.
Related Publications

Of the same scientific areas

SIGA-Sistema Integrado de Gestão Autárquica, (1987)
Technical Report
Gabriel David; Vladimiro Miranda; Maria Cristina Ribeiro
Potencialidades das calculadoras programáveis de bolso (1978)
Technical Report
Carlos Manuel Novais Madureira
Moodle at FEUP (2005)
Technical Report
Jaime Enrique Villate Matiz
Módulo de avaliação de licenciaturas (2001)
Technical Report
António Manuel Sousa Cunha

See all (99)

Recommend this page Top
Copyright 1996-2025 © Faculdade de Direito da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z
Page created on: 2025-07-14 at 16:34:55 | Privacy Policy | Personal Data Protection Policy | Whistleblowing