Go to:
Logótipo
You are here: Start > PRODEI034

Language Processing and Information Extraction

Code: PRODEI034     Acronym: PLEI

Keywords
Classification Keyword
OFICIAL Intelligent Systems

Instance: 2022/2023 - 1S Ícone do Moodle

Active? Yes
Web Page: https://moodle.up.pt/enrol/index.php?id=1578
Responsible unit: Department of Informatics Engineering
Course/CS Responsible: Doctoral Program in Informatics Engineering

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
PRODEI 2 Syllabus 1 - 6 28 162
Mais informaçõesLast updated on 2022-08-04.

Fields changed: Learning outcomes and competences, Métodos de ensino e atividades de aprendizagem, Fórmula de cálculo da classificação final, Componentes de Avaliação e Ocupação, Obtenção de frequência, Programa, Melhoria de classificação

Teaching language

Suitable for English-speaking students

Objectives

The main objective of this course is to equip students with knowledge about natural language processing and information extraction techniques, combining the presentation of theoretical foudations with pratical applications.

Learning outcomes and competences

Upon completing the course students should be able to:

- Explain the fundamental concepts and techniques in natural language processing and information extraction;
- Demonstrate knowledge of relevant literature and be able to synthesize and present research work;
- Design and implement systems that perform analysis and automatic extraction of information expressed in natural language.

Working method

Presencial

Program

The curricular unit will be organized in two parts that include the theoretical component and a project component. The theoretical component will introduce concepts on language processing and information extraction and discuss recent literature on the subject.

The project component will allow students to apply these concepts in practical case studies. Students will perform research, development, and evaluation of a solution for language processing and information extraction. During the research and development stages, students will be accompanied under tutorship.

The course will address the following topics:
- Introduction to natural language processing: definitions, tasks, and applications.
- Basic text processing: regular expressions, tokenization, normalization, lemmatization, stemming, segmentation.
- Language models: n-grams.
- Text classification: bag-of-words, TF-IDF, n-grams, Naive Bayes, feature engineering; generative and discriminative classifiers.
- Sequence models: hidden Markov models, conditional random fields; POS-tagging and named entity recognition.
- Vectorized representations of words: lexical semantics, word embeddings.
- Neural networks in natural language processing: neural language models, recurrent neural networks, encoder-decoder networks, attention, transformer networks.
- Information extraction: named entity recognition and relation extraction, event and time extraction, template filling.
- Contemporary research in natural language processing and information extraction.

Mandatory literature

Daniel Jurafsky; Speech and language processing. ISBN: 0-13-095069-6 (https://web.stanford.edu/~jurafsky/slp3/)

Complementary Bibliography

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze; Introduction to information retrieval. ISBN: 978-0-521-86571-5 (Full content available at http://nlp.stanford.edu/IR-book/)
Steven Bird, Ewan Klein, Edward Loper; Natural Language Processing with Python, O'Reilly Media, 2009. ISBN: 978-0-596-51649-9 (Full content available at http://www.nltk.org/book/)
Yoav Goldberg; Neural network methods for natural language processing. ISBN: 978-1-62705-298-6
Jacob Eisenstein; Introduction to natural language processing. ISBN: 978-0-262-04284-0

Teaching methods and learning activities

The lectures will be used to cover the topics of the course unit, which will be accompanied by exercises provided based on Jupyter notebooks (Python). The goal is to introduce the tools that will be used in the practical work as early as possible. At the same time, suggestions for related literature will be provided as opportunities for further reading. Students will be invited to give brief presentations on recent research trends in NLP. Part of the classes will also be used for individualized tutorial monitoring of practical work.

Each student will define and carry out a practical project throughout the semester, which will result in the writing of a scientific paper. The topics of the projects are proposed by the students and validated by the lecturer.

Students will also give presentations on recent topics in natural language processing.

keywords

Technological sciences > Engineering > Computer engineering

Evaluation Type

Distributed evaluation without final exam

Assessment Components

Designation Weight (%)
Trabalho escrito 25,00
Apresentação/discussão de um trabalho científico 50,00
Trabalho prático ou de projeto 25,00
Total: 100,00

Amount of time allocated to each course unit

Designation Time (hours)
Estudo autónomo 40,00
Frequência das aulas 26,00
Apresentação/discussão de um trabalho científico 1,00
Trabalho de investigação 20,00
Trabalho escrito 25,00
Trabalho laboratorial 50,00
Total: 162,00

Eligibility for exams

Assessment includes four components:
1) Oral presentation on a recent research direction: 30%
2) Practical assignment: 25%
3) Scientific article documenting the practical assignment: 25%
4) Final presentation of the developed assignment: 20%

Every component is subject to a minimum grade of 7 out of 20.

Calculation formula of final grade

The final grade (CF) is calculated as follows:

CF = 30% * AO + 25% * TP + 25% * AC + 20% * AF

Evaluation components:
- AO: Oral presentation on a recent research direction
- TP: Practical assignment
- AC: Scientific article documenting the practical assignment
- AF: Final presentation of the developed assignment

Special assessment (TE, DA, ...)

Students under special evaluation constraints are allowed to skip lectures. However, they still have to make the public presentations described in the previous section, and the final grades will be given according to the evaluation criteria already described. In these cases, the students must schedule regular meetings with the teacher to discuss the ongoing work.

Classification improvement

In the next edition of the Curricular Unit.

Recommend this page Top
Copyright 1996-2024 © Faculdade de Engenharia da Universidade do Porto  I Terms and Conditions  I Accessibility  I Index A-Z  I Guest Book
Page generated on: 2024-11-09 at 08:53:25 | Acceptable Use Policy | Data Protection Policy | Complaint Portal