Code: | PRODEI034 | Acronym: | PLEI |
Keywords | |
---|---|
Classification | Keyword |
OFICIAL | Intelligent Systems |
Active? | Yes |
Web Page: | https://moodle.up.pt/enrol/index.php?id=1578 |
Responsible unit: | Department of Informatics Engineering |
Course/CS Responsible: | Doctoral Program in Informatics Engineering |
Acronym | No. of Students | Study Plan | Curricular Years | Credits UCN | Credits ECTS | Contact hours | Total Time |
---|---|---|---|---|---|---|---|
PRODEI | 2 | Syllabus | 1 | - | 6 | 28 | 162 |
The main objective of this course is to equip students with knowledge about natural language processing and information extraction techniques, combining the presentation of theoretical foudations with pratical applications.
Upon completing the course students should be able to:
- Explain the fundamental concepts and techniques in natural language processing and information extraction;
- Demonstrate knowledge of relevant literature and be able to synthesize and present research work;
- Design and implement systems that perform analysis and automatic extraction of information expressed in natural language.
The curricular unit will be organized in two parts that include the theoretical component and a project component. The theoretical component will introduce concepts on language processing and information extraction and discuss recent literature on the subject.
The project component will allow students to apply these concepts in practical case studies. Students will perform research, development, and evaluation of a solution for language processing and information extraction. During the research and development stages, students will be accompanied under tutorship.
The course will address the following topics:
- Introduction to natural language processing: definitions, tasks, and applications.
- Basic text processing: regular expressions, tokenization, normalization, lemmatization, stemming, segmentation.
- Language models: n-grams.
- Text classification: bag-of-words, TF-IDF, n-grams, Naive Bayes, feature engineering; generative and discriminative classifiers.
- Sequence models: hidden Markov models, conditional random fields; POS-tagging and named entity recognition.
- Vectorized representations of words: lexical semantics, word embeddings.
- Neural networks in natural language processing: neural language models, recurrent neural networks, encoder-decoder networks, attention, transformer networks.
- Information extraction: named entity recognition and relation extraction, event and time extraction, template filling.
- Contemporary research in natural language processing and information extraction.
The lectures will be used to cover the topics of the course unit, which will be accompanied by exercises provided based on Jupyter notebooks (Python). The goal is to introduce the tools that will be used in the practical work as early as possible. At the same time, suggestions for related literature will be provided as opportunities for further reading. Students will be invited to give brief presentations on recent research trends in NLP. Part of the classes will also be used for individualized tutorial monitoring of practical work.
Each student will define and carry out a practical project throughout the semester, which will result in the writing of a scientific paper. The topics of the projects are proposed by the students and validated by the lecturer.
Students will also give presentations on recent topics in natural language processing.
Designation | Weight (%) |
---|---|
Trabalho escrito | 25,00 |
Apresentação/discussão de um trabalho científico | 50,00 |
Trabalho prático ou de projeto | 25,00 |
Total: | 100,00 |
Designation | Time (hours) |
---|---|
Estudo autónomo | 40,00 |
Frequência das aulas | 26,00 |
Apresentação/discussão de um trabalho científico | 1,00 |
Trabalho de investigação | 20,00 |
Trabalho escrito | 25,00 |
Trabalho laboratorial | 50,00 |
Total: | 162,00 |
Assessment includes four components:
1) Oral presentation on a recent research direction: 30%
2) Practical assignment: 25%
3) Scientific article documenting the practical assignment: 25%
4) Final presentation of the developed assignment: 20%
Every component is subject to a minimum grade of 7 out of 20.
The final grade (CF) is calculated as follows:
CF = 30% * AO + 25% * TP + 25% * AC + 20% * AF
Evaluation components:
- AO: Oral presentation on a recent research direction
- TP: Practical assignment
- AC: Scientific article documenting the practical assignment
- AF: Final presentation of the developed assignment
Students under special evaluation constraints are allowed to skip lectures. However, they still have to make the public presentations described in the previous section, and the final grades will be given according to the evaluation criteria already described. In these cases, the students must schedule regular meetings with the teacher to discuss the ongoing work.
In the next edition of the Curricular Unit.