Code: | PRODEI034 | Acronym: | PLEI |
Keywords | |
---|---|
Classification | Keyword |
OFICIAL | Intelligent Systems |
Active? | Yes |
Web Page: | https://moodle.up.pt/enrol/index.php?id=1578 |
Responsible unit: | Department of Informatics Engineering |
Course/CS Responsible: | Doctoral Program in Informatics Engineering |
Acronym | No. of Students | Study Plan | Curricular Years | Credits UCN | Credits ECTS | Contact hours | Total Time |
---|---|---|---|---|---|---|---|
PRODEI | 13 | Syllabus | 1 | - | 6 | 28 | 162 |
The main objective of this course is to equip students with knowledge about natural language processing and information extraction techniques, combining the presentation of theoretical foudations with pratical applications.
Upon completing the course students should be able to:
- Explain the fundamental concepts and techniques in natural language processing and information extraction;
- Demonstrate knowledge of relevant literature and be able to synthesize and present research work;
- Design and implement systems that perform analysis and automatic extraction of information expressed in natural language.
The curricular unit will be organized in two parts that include the theoretical component and a project component. The theoretical component will introduce concepts on language processing and information extraction and discuss recent literature on the subject.
The project component will allow students to apply these concepts in practical case studies. Students will perform research, development, and evaluation of a solution of language processing and information extraction. During the research and development stages, students will be accompanied under the tutorship.
The course will address the following topics:
- Introduction to natural language processing: definitions, tasks, and applications.
- Basic text processing: regular expressions, tokenization, normalization, lemmatization, stemming, segmentation.
- Language models: n-grams.
- Text classification: bag-of-words, Naive Bayes, feature engineering; generative and discriminative classifiers.
- Vectorized representations of words: lexical semantics, word embeddings.
- Sequence models: hidden Markov models, conditional random fields; POS-tagging and named entity recognition.
- Neural networks in natural language processing: neural language models, recurrent neural networks, encoder-decoder networks, attention, transformer networks.
- Information extraction: named entity recognition and relation extraction, event and time extraction, template filling.
- Contemporary research in natural language processing and information extraction.
Students will have to attend lectures. Individual research work will be supported by the teacher on a one-to-one basis.
Students define and develop a semester-long project. Project themes are proposed by the students and validated with the teacher.
The evaluation of the project is based on two components:
1) SP: short-paper - 30% of the final grade
2) FP: full-paper - 70% of the final grade.
The SP component will be evaluated halfway through the semester and will consist of:
- SP1: the student will have to prepare a short paper describing the first investigations in tackling the selected problem.
- SP2: short presentation (10 minutes) on the work done so far.
The FP component will be evaluated at the end of the semester and consists of:
- FP1: a full-paper (written in English) containing a description of the final solution of the problem, and results of the evaluation experiments regarding the proposed solution.
- FP2: public presentation (25 minutes) and demonstration of the project.
Designation | Weight (%) |
---|---|
Prova oral | 35,00 |
Trabalho escrito | 65,00 |
Total: | 100,00 |
Designation | Time (hours) |
---|---|
Elaboração de projeto | 56,00 |
Estudo autónomo | 42,00 |
Frequência das aulas | 42,00 |
Total: | 140,00 |
In all evaluation components (SP1, SP2, FP1 and FP2) a minimum score of 7 out of 20 is required. For successfully obtaining a final grade, students must obtain a minimum score on the four components.
The final grade (CF) is calculated as follows:
CF = (20% * SP1 + 10% * SP2) + (45% * FP1 + 25% * FP2).
Evaluation components:
- SP1: short-paper
- SP2: short presentation (10 minutes)
- FP1: full-paper
- FP2: public presentation (25 minutes) and demonstration of the project.
Students under special evaluation constraints are allowed to skip lectures. However, they still have to make the public presentations described in the previous section and the final grades will be given according to evaluation criteria already described. In these cases, the students must schedule regular meetings with the teacher to discuss the ongoing work.
Only the end-of-semester evaluation (70%) can be subject to grade improvement. The student will have to submit a new research work (i.e. full-paper) and make the corresponding public presentation.