Language Processing and Information Extraction
Keywords |
Classification |
Keyword |
OFICIAL |
Intelligent Systems |
Instance: 2024/2025 - 1S 
Cycles of Study/Courses
Teaching Staff - Responsibilities
Teaching language
Suitable for English-speaking students
Objectives
The main objective of this course is to equip students with knowledge about techniques for natural language processing and information extraction from text, combining the presentation of theoretical foundations with practical applications.
Learning outcomes and competences
Upon completing the course, students should be able to:
- Explain the fundamental concepts and techniques for natural language processing and information extraction from text;
- Be knowledgeable of relevant scientific literature and show ability to interpret and present research work in this domain;
- Design and implement systems for natural language processing and that analyze and automatically extract information expressed in natural language.
Working method
Presencial
Program
- Introduction to natural language processing: definitions, tasks, and applications.
- Basic text processing: regular expressions, tokenization, normalization, lemmatization, stemming, segmentation.
- Language models: n-grams.
- Text classification: bag-of-words, TF-IDF, n-grams, Naive Bayes, feature engineering; generative and discriminative classifiers.
- Sequence models: hidden Markov models, conditional random fields; POS-tagging and named entity recognition.
- Vectorized representations of words: lexical semantics, word embeddings.
- Neural networks in natural language processing: neural language models, recurrent neural networks, encoder-decoder networks, attention, transformer networks.
- Large language models.
- Contemporary research in natural language processing and information extraction.
Mandatory literature
Daniel Jurafsky;
Speech and language processing. ISBN: 0-13-095069-6 (https://web.stanford.edu/~jurafsky/slp3/)
Complementary Bibliography
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze;
Introduction to information retrieval. ISBN: 978-0-521-86571-5 (Full content available at http://nlp.stanford.edu/IR-book/)
Steven Bird, Ewan Klein, Edward Loper;
Natural Language Processing with Python, O'Reilly Media, 2009. ISBN: 978-0-596-51649-9 (Full content available at http://www.nltk.org/book/)
Yoav Goldberg;
Neural network methods for natural language processing. ISBN: 978-1-62705-298-6
Jacob Eisenstein;
Introduction to natural language processing. ISBN: 978-0-262-04284-0
Teaching methods and learning activities
The curricular unit works under a hybrid regime, including a theoretical component and a project component. The theoretical component will introduce language processing and information extraction concepts and discuss recent literature on the subject.
The project component will allow students to apply these concepts in practical case studies. Students will research, develop, and evaluate a solution for language processing and information extraction. During the research and development stages, students will be accompanied by tutorship.
The lectures will be used to cover the topics of the course unit, which will be accompanied by exercises provided based on Jupyter notebooks (Python). The goal is to introduce the tools that will be used in the practical work as early as possible. At the same time, suggestions for related literature will be provided as opportunities for further reading. Students will be invited to give brief presentations on recent research trends in NLP. Part of the classes will also be used for individualized tutorial monitoring of practical work.
Each student will define and carry out a practical project throughout the semester, which will result in writing a scientific paper. The topics of the projects are proposed by the students and validated by the lecturer.
Students will also give presentations on recent topics in natural language processing.
keywords
Technological sciences > Engineering > Computer engineering
Evaluation Type
Distributed evaluation without final exam
Assessment Components
Designation |
Weight (%) |
Trabalho escrito |
25,00 |
Apresentação/discussão de um trabalho científico |
50,00 |
Trabalho prático ou de projeto |
25,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
Designation |
Time (hours) |
Estudo autónomo |
40,00 |
Frequência das aulas |
26,00 |
Apresentação/discussão de um trabalho científico |
1,00 |
Trabalho de investigação |
20,00 |
Trabalho escrito |
25,00 |
Trabalho laboratorial |
50,00 |
Total: |
162,00 |
Eligibility for exams
Assessment includes four components:
1) Oral presentation on a recent research direction: 30%
2) Practical assignment: 25%
3) Scientific article documenting the practical assignment: 25%
4) Final presentation of the developed assignment: 20%
Every component is subject to a minimum grade of 7 out of 20.
Calculation formula of final grade
The final grade (CF) is calculated as follows:
CF = 30% * AO + 25% * TP + 25% * AC + 20% * AF
Evaluation components:
- AO: Oral presentation on a recent research direction
- TP: Practical assignment
- AC: Scientific article documenting the practical assignment
- AF: Final presentation of the developed assignment
Special assessment (TE, DA, ...)
Students under special evaluation constraints are allowed to skip lectures. However, they still have to make the public presentations described in the previous section, and the final grades will be given according to the evaluation criteria already described. In these cases, the students must schedule regular meetings with the teacher to discuss the ongoing work.
Classification improvement
In the following edition of the Curricular Unit.