Language Processing and Information Extraction
Keywords |
Classification |
Keyword |
OFICIAL |
Intelligent Systems |
Instance: 2012/2013 - 1S
Cycles of Study/Courses
Acronym |
No. of Students |
Study Plan |
Curricular Years |
Credits UCN |
Credits ECTS |
Contact hours |
Total Time |
PRODEI |
2 |
Syllabus |
1 |
- |
6 |
54 |
162 |
Teaching language
Suitable for English-speaking students
Objectives
The main objective of this course is to equip students with knowledge about natural language processing and information extraction techniques, and present a set of scenarios of real application of such techniques, demonstrating how to process large amounts of information available in different types of repositories, such as news, scientific articles, blogs, social networks, etc.. We will also present machine learning techniques for supervised and unsupervised classification, which fundamental in the development of language processing and information extraction systems.
Upon completing the course students should be able to:
- Explain the fundamental concepts and techniques in natural language processing and information extraction
- Demonstrate knowledge of relevant literature and be able to synthesize and present research work
- Design and implement systems that perform analysis and automatic extraction of information expressed in natural language or in a semi-structured format.
Program
The curricular unit will be organized in two parts that include the theoretical component and a project component. The theoretical component will introduce concepts on language processing and information extraction and discuss recent literature on the subject. The project component will allow students to apply these concepts in practical case studies. Students will perform research, development and evaluation of a solution of language processing and information extraction. During the research and development stages, students will be accompanied under the tutorship.
The course will address the following topics:
- Introduction to basic problems and language processing and related support resources.
- Presentation of techniques and typical applications of natural language processing and information extraction: named entity recognition, co-locations, POS tagging, automatic summarization, sentiment analysis, word-sense disambiguation, etc..
- Introduction to machine learning techniques for text classification and topic extraction (e.g., SVMs, Latent Dirichlet Allocation). Representation of documents: bag-of-words, n-grams.
- Processing of user-generated content and extraction of information in social networks (e.g., blogs, micro-blogs, etc..). Folksonomies, identification of topics, summarization, content recommendation.
- Extraction of semantic relations and named entity disambiguation using external resources (e.g., Wikipedia, Wordnet).
- Log analysis, pattern and trend detection; recommendations.
Mandatory literature
Christopher D. Manning And Hinrich Schütze; Foundations of Statistical Natural Language Processing, MIT-Press, 1999. ISBN: 0-262-13360-1
Teaching methods and learning activities
Students will have to attend lectures. Individual research work will be supported by the teacher on a one-to-one basis.
keywords
Technological sciences > Engineering > Computer engineering
Evaluation Type
Distributed evaluation without final exam
Eligibility for exams
In all evaluation components (SP1, SP2, FP1 and FP2) a minimum score of 7 out of 20 is required. For successfully obtaining a final grade, students must obtain a minimum score on the four components.
Calculation formula of final grade
The final grade (CF) is calculated as follows:
CF = (20% * SP1 + 10% * SP2) + (45% * FP1 + 25% * FP2)
Where (see below for a more detailed description):
- SP1: short-paper
- SP2: short presentation (10 minutes)
- FP1: full-paper
- FP2: public presentation (25 minutes) and demonstration of the project.
Examinations or Special Assignments
Each student will be given a set of problems, from which he/she will select one. The student will be graded according to how he/she achieves the corresponding solution. More specifically, in this course we will be evaluating:
1) how the student researches and compares solutions already proposed for the problem at hand;
2) how the student proposes and implements a (possibly original) solution to the problem;
3) how the student evaluates the solution he/she proposes;
4) how the student proposes improvements to the initial solution, and also how he/she implements and evaluates such improvements;
5) how the student communicates to others the solution he/she developed.
The evaluation willl include two components:
1) SP: "short-paper" - 30% of the final grade
2) FP: "full-paper" - 70% of the final grade
The SP component will be evaluated halfway through the semester and will consist of:
SP1: the student will have to prepare a “short-paper” describing the first experiments in tackling the selected problem.
SP2: short presentation (10 minutes) on the work done so far.
The FP component will be evaluated at the end of the semester and consists of:
FP1: as "full-paper" (written in English) containing a description of the final solution of the problem, and results of the evaluation experiments regarding the proposed solution.
FP2: public presentation (25 minutes) and demonstration of the project.
Special assessment (TE, DA, ...)
Students under special evaluation constraints are allowed to skip lectures. However, they still have to make the public presentations described in the previous section and the final grades will be given according to evaluation criteria already described.
Classification improvement
Only the end-of-semester evaluation (70%) can be subject to grade improvement. The student will have to resubmit a new research work (i.e. full-paper) and make the corresponding public presentation.