Go to:
Logótipo
You are here: Start > PRODEI034

Language Processing and Information Extraction

Code: PRODEI034     Acronym: PLEI

Keywords
Classification Keyword
OFICIAL Intelligent Systems

Instance: 2024/2025 - 1S Ícone do Moodle

Active? Yes
Responsible unit: Department of Informatics Engineering
Course/CS Responsible: Doctoral Program in Informatics Engineering

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
PDMAPI 1 Official Study Plan since 2020/2021. 1 - 6 28 162
PRODEI 7 Syllabus 1 - 6 28 162

Teaching Staff - Responsibilities

Teacher Responsibility
Henrique Daniel de Avelar Lopes Cardoso

Teaching - Hours

Recitations: 2,00
Type Teacher Classes Hour
Recitations Totals 1 2,00
Henrique Daniel de Avelar Lopes Cardoso 2,00

Teaching language

Suitable for English-speaking students

Objectives

The main objective of this course is to equip students with knowledge about techniques for natural language processing and information extraction from text, combining the presentation of theoretical foundations with practical applications.

Learning outcomes and competences

Upon completing the course, students should be able to:

- Explain the fundamental concepts and techniques for natural language processing and information extraction from text;
- Be knowledgeable of relevant scientific literature and show ability to interpret and present research work in this domain;
- Design and implement systems for natural language processing and that analyze and automatically extract information expressed in natural language.

Working method

Presencial

Program

- Introduction to natural language processing: definitions, tasks, and applications.
- Basic text processing: regular expressions, tokenization, normalization, lemmatization, stemming, segmentation.
- Language models: n-grams.
- Text classification: bag-of-words, TF-IDF, n-grams, Naive Bayes, feature engineering; generative and discriminative classifiers.
- Sequence models: hidden Markov models, conditional random fields; POS-tagging and named entity recognition.
- Vectorized representations of words: lexical semantics, word embeddings.
- Neural networks in natural language processing: neural language models, recurrent neural networks, encoder-decoder networks, attention, transformer networks.
- Large language models.
- Contemporary research in natural language processing and information extraction.

Mandatory literature

Daniel Jurafsky; Speech and language processing. ISBN: 0-13-095069-6 (https://web.stanford.edu/~jurafsky/slp3/)

Complementary Bibliography

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze; Introduction to information retrieval. ISBN: 978-0-521-86571-5 (Full content available at http://nlp.stanford.edu/IR-book/)
Steven Bird, Ewan Klein, Edward Loper; Natural Language Processing with Python, O'Reilly Media, 2009. ISBN: 978-0-596-51649-9 (Full content available at http://www.nltk.org/book/)
Yoav Goldberg; Neural network methods for natural language processing. ISBN: 978-1-62705-298-6
Jacob Eisenstein; Introduction to natural language processing. ISBN: 978-0-262-04284-0

Teaching methods and learning activities

The curricular unit works under a hybrid regime, including a theoretical component and a project component. The theoretical component will introduce language processing and information extraction concepts and discuss recent literature on the subject.

The project component will allow students to apply these concepts in practical case studies. Students will research, develop, and evaluate a solution for language processing and information extraction. During the research and development stages, students will be accompanied by tutorship.

The lectures will be used to cover the topics of the course unit, which will be accompanied by exercises provided based on Jupyter notebooks (Python). The goal is to introduce the tools that will be used in the practical work as early as possible. At the same time, suggestions for related literature will be provided as opportunities for further reading. Students will be invited to give brief presentations on recent research trends in NLP. Part of the classes will also be used for individualized tutorial monitoring of practical work.

Each student will define and carry out a practical project throughout the semester, which will result in writing a scientific paper. The topics of the projects are proposed by the students and validated by the lecturer.

Students will also give presentations on recent topics in natural language processing.

keywords

Technological sciences > Engineering > Computer engineering

Evaluation Type

Distributed evaluation without final exam

Assessment Components

Designation Weight (%)
Trabalho escrito 25,00
Apresentação/discussão de um trabalho científico 50,00
Trabalho prático ou de projeto 25,00
Total: 100,00

Amount of time allocated to each course unit

Designation Time (hours)
Estudo autónomo 40,00
Frequência das aulas 26,00
Apresentação/discussão de um trabalho científico 1,00
Trabalho de investigação 20,00
Trabalho escrito 25,00
Trabalho laboratorial 50,00
Total: 162,00

Eligibility for exams

Assessment includes four components:
1) Oral presentation on a recent research direction: 30%
2) Practical assignment: 25%
3) Scientific article documenting the practical assignment: 25%
4) Final presentation of the developed assignment: 20%

Every component is subject to a minimum grade of 7 out of 20.

Calculation formula of final grade

The final grade (CF) is calculated as follows:

CF = 30% * AO + 25% * TP + 25% * AC + 20% * AF

Evaluation components:
- AO: Oral presentation on a recent research direction
- TP: Practical assignment
- AC: Scientific article documenting the practical assignment
- AF: Final presentation of the developed assignment

Special assessment (TE, DA, ...)

Students under special evaluation constraints are allowed to skip lectures. However, they still have to make the public presentations described in the previous section, and the final grades will be given according to the evaluation criteria already described. In these cases, the students must schedule regular meetings with the teacher to discuss the ongoing work.

Classification improvement

In the following edition of the Curricular Unit.

Recommend this page Top
Copyright 1996-2025 © Faculdade de Engenharia da Universidade do Porto  I Terms and Conditions  I Accessibility  I Index A-Z  I Guest Book
Page generated on: 2025-06-16 at 02:35:04 | Acceptable Use Policy | Data Protection Policy | Complaint Portal