Saltar para:
Logótipo
This page in english Ajuda Autenticar-se
FCUP
Você está em: Início > CC581

Extração de Informação Avançada

Código: CC581     Sigla: AIE

Áreas Científicas
Classificação Área Científica
OFICIAL Informática

Ocorrência: 2016/2017 - 1S

Ativa? Sim
Unidade Responsável: Departamento de Ciência de Computadores
Curso/CE Responsável: Doutoramento em Informática

Ciclos de Estudo/Cursos

Sigla Nº de Estudantes Plano de Estudos Anos Curriculares Créditos UCN Créditos ECTS Horas de Contacto Horas Totais
PDMAPI 10 PE Oficial do Programa Doutoral em Informática 1 - 5 49 135
PE Oficial do Programa Doutoral em Informática a partir de 2016/17 1 - 5 49 135

Docência - Responsabilidades

Docente Responsabilidade
Mário Jorge Ferreira Rodrigues Regente

Docência - Horas

Teórica: 3,00
Orientação Tutorial: 0,50
Tipo Docente Turmas Horas
Teórica Totais 1 3,00
Mário Jorge Ferreira Rodrigues 1,05
Orientação Tutorial Totais 1 0,50
Mário Jorge Ferreira Rodrigues 0,17

Língua de trabalho

Inglês

Objetivos


  • Explanation of the main concepts and technologies of information extraction at large;

  • Experiment and analyze existing applications of information extraction

  • Demonstrate and encourage students to combine several modules that compose a working system

  • Understand the current challenges of concrete designs and solutions

  • Knowledge of state of the art, allowing a critical attitude about the possibility of using information extraction technologies in concrete tasks

  • Design and implement information extraction applications

Resultados de aprendizagem e competências

This course is intended to provide a practical yet state-of-the-art experience in information extraction, with emphasis on the current state of the art tools and technologies.

Modo de trabalho

Presencial

Programa

1. Introduction
a. Motivation
b. Examples of state-of-the-art systems and tools

c. Overview of Information Extraction

d. Demos

e. Tutorial Example (using, for example, Stanford NLPsuite) [sec 5.1 of Rodrigues and Teixeira 2015 book]

f. Presentation of course modules and teachers


2. Background information
a. Text and Document Processing

i. Regular Expressions

ii. Markup languages and tools (XML, XSLT, schemas, DTDs etc)

b. Semantics and Knowledge Representation Basics

i. Semantics

ii. Taxonomies, Thesauri and Ontologies

iii. WordNet

iv. Reasoning basics

v. OWL, Triple Stores, SPARQL

c. Natural Language Processing

i. Processing levels

ii. Typical NLP pipeline

iii. NLP common tasks

1. Segmentation and tokenization

2. Morphological Analysis and tagging

3. (Syntactic) Parsing

4. Named Entities Recognition


3. Information Extraction (IE) - An Overview
a. Main approaches

b. Performance metrics

c. Challenges

d. General architecture and pipeline,

i. Process overview


4. Data Gathering, Preparation and Enrichment
a. Objectives

b. Process overview

c. Tools

i. Tokenizers

ii. Sentence boundary detectors

iii. Morphological analysers and POS taggers

iv. Syntactic parsers

d. Representative software suites


5. Identifying Things and Relations
a. Identifying Things/Entities (Who, Where and When)

b. Identifying relations

c. Information fusion


6. Ontology-based Information Extraction (OBIE)
a. Basic idea

b. Types

c. Architecture

d. Case study

i. OBIE system for eGov, developed by Mário Rodrigues

e. OBIE to the limit: Open Information Extraction


7. Systems and Applications
a. Case studies

i. System developed at DETI/IEETA, Univ. Aveiro

1. IE Applied to Health (MedInx and HealthInX)

2. IE applied to eGov

ii. Other systems (selected each year from state-of-the-art)

Bibliografia Obrigatória

Mário Rodrigues, António Teixeira; Advanced Applications of Natural Language Processing for Performing Information Extraction, Springer, 2015
Ingersoll, G. S., Morton, T. S., Farris, A. L; Taming Text: How to Find, Organize, and Manipulate It, Manning Publications, 2013
Mitkov, R; The Oxford handbook of computational linguistics, Oxford University Press, 2005
Mário Rodrigues; Model of Access to Natural Language Sources in Electronic Government, 2013

Métodos de ensino e atividades de aprendizagem

The teaching method consists of theoretical-practical classes. A total of 20 hours is planned to present the several parts of the program. Whenever possible, presentations of examples and demonstrations will be performed, to complement the classical exposition, more or less supported in PPTs. At least 8 hours will be allocated to students presentations of their work and discussion with teachers and colleagues.

Tipo de avaliação

Avaliação distribuída sem exame final

Componentes de Avaliação

Designação Peso (%)
Apresentação/discussão de um trabalho científico 25,00
Trabalho escrito 25,00
Trabalho prático ou de projeto 50,00
Total: 100,00

Fórmula de cálculo da classificação final

The assessment method adopted for the course is based in projects. Students create a concrete application of Information Extraction based in rich NLP processing in two stages. First they will define and develop a complete pipeline to extract information and evaluate it. The pipeline and evaluation results will be presented in class for evaluation, supported by a PPT. Second step will consist in the creation of the complete application, with user inputs and visualization/transmission of the results to the end user. A second public presentation will be made including a demo, for evaluation. The third information for evaluation will be a short written report regarding both application and pipeline.
Recomendar Página Voltar ao Topo
Copyright 1996-2020 © Faculdade de Ciências da Universidade do Porto  I Termos e Condições  I Acessibilidade  I Índice A-Z  I Livro de Visitas
Página gerada em: 2020-07-04 às 22:12:55 | Política de Utilização Aceitável | Política de Proteção de Dados Pessoais