Go to:
Logótipo
You are here: Start > PRODEI034

Language Processing and Information Extraction

Code: PRODEI034     Acronym: PLEI

Keywords
Classification Keyword
OFICIAL Intelligent Systems

Instance: 2019/2020 - 1S Ícone do Moodle

Active? Yes
Web Page: http://www.fe.up.pt/~ssn/2015/plei/
Responsible unit: Department of Informatics Engineering
Course/CS Responsible: Doctoral Program in Informatics Engineering

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
PRODEI 8 Syllabus 1 - 6 28 162

Teaching language

Suitable for English-speaking students

Objectives

The main objective of this course is to equip students with knowledge about natural language processing and information extraction techniques, combining the presentation of theoretical foudations with pratical applications.

Learning outcomes and competences

Upon completing the course students should be able to:

- Explain the fundamental concepts and techniques in natural language processing and information extraction;
- Demonstrate knowledge of relevant literature and be able to synthesize and present research work;
- Design and implement systems that perform analysis and automatic extraction of information expressed in natural language.

Working method

Presencial

Program

The curricular unit will be organized in two parts that include the theoretical component and a project component. The theoretical component will introduce concepts on language processing and information extraction and discuss recent literature on the subject.

The project component will allow students to apply these concepts in practical case studies. Students will perform research, development and evaluation of a solution of language processing and information extraction. During the research and development stages, students will be accompanied under the tutorship.

The course will address the following topics:
- Introduction to the concepts associated with natural language processing.
- Presentation of techniques and typical applications of natural language processing and information extraction: named entity recognition, co-locations, POS tagging, automatic summarization, sentiment analysis, word-sense disambiguation, etc.
- Introduction to machine learning techniques for text classification and topic extraction (e.g., SVM, Latent Dirichlet Allocation). Representation of documents: bag-of-words, n-grams.
- Processing of user-generated content and extraction of information in social networks (e.g., blogs, micro-blogs, etc..). Folksonomies, identification of topics, summarization, content recommendation.
- Extraction of semantic relations and named entity disambiguation using external resources (e.g., Wikipedia, Wordnet).
- Log analysis, pattern and trend detection; recommendations.

Mandatory literature

Christopher D. Manning, Hinrich Schutze; Foundations of statistical natural language processing. ISBN: 0-262-13360-1

Complementary Bibliography

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze; Introduction to information retrieval. ISBN: 978-0-521-86571-5 (Full content available at http://nlp.stanford.edu/IR-book/)
Steven Bird, Ewan Klein, Edward Loper; Natural Language Processing with Python, O'Reilly Media, 2009. ISBN: 978-0-596-51649-9 (Full content available at http://www.nltk.org/book/)

Teaching methods and learning activities

Students will have to attend lectures. Individual research work will be supported by the teacher on a one-to-one basis.

Students define and develop a semester-long project. Project themes are proposed by the students and validated with the teacher.  

The evaluation of the project is based on two components:
1) SP: short-paper - 30% of the final grade
2) FP: full-paper - 70% of the final grade.

The SP component will be evaluated halfway through the semester and will consist of: SP1: the student will have to prepare a short-paper describing the first investigations in tackling the selected problem. SP2: short presentation (10 minutes) on the work done so far.

The FP component will be evaluated at the end of the semester and consists of: FP1: as full-paper (written in English) containing a description of the final solution of the problem, and results of the evaluation experiments regarding the proposed solution. FP2: public presentation (25 minutes) and demonstration of the project.

keywords

Technological sciences > Engineering > Computer engineering

Evaluation Type

Distributed evaluation without final exam

Assessment Components

Designation Weight (%)
Prova oral 35,00
Trabalho escrito 65,00
Total: 100,00

Amount of time allocated to each course unit

Designation Time (hours)
Elaboração de projeto 56,00
Estudo autónomo 42,00
Frequência das aulas 42,00
Total: 140,00

Eligibility for exams

In all evaluation components (SP1, SP2, FP1 and FP2) a minimum score of 7 out of 20 is required. For successfully obtaining a final grade, students must obtain a minimum score on the four components.

Calculation formula of final grade

The final grade (CF) is calculated as follows:

CF = (20% * SP1 + 10% * SP2) + (45% * FP1 + 25% * FP2).

Evaluation components:
- SP1: short-paper
- SP2: short presentation (10 minutes)
- FP1: full-paper
- FP2: public presentation (25 minutes) and demonstration of the project.

Special assessment (TE, DA, ...)

Students under special evaluation constraints are allowed to skip lectures. However, they still have to make the public presentations described in the previous section and the final grades will be given according to evaluation criteria already described. In these cases, the students must schedule regular meetings with the teacher to discuss the ongoing work.

Classification improvement

Only the end-of-semester evaluation (70%) can be subject to grade improvement. The student will have to submit a new research work (i.e. full-paper) and make the corresponding public presentation.

Recommend this page Top
Copyright 1996-2025 © Faculdade de Engenharia da Universidade do Porto  I Terms and Conditions  I Accessibility  I Index A-Z  I Guest Book
Page generated on: 2025-06-16 at 06:12:10 | Acceptable Use Policy | Data Protection Policy | Complaint Portal