Esta página em português Ajuda Autenticar-se
You are in:: Start > CC4073
Authentication

Site map
Options

# Data Stream Mining

 Code: CC4073 Acronym: CC4073 Level: 400

Keywords
Classification Keyword
OFICIAL Computer Science

## Instance: 2020/2021 - 2S

 Active? Yes Responsible unit: Department of Computer Science Course/CS Responsible: Master's degree in Data Science

### Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
M:DS 17 Official Study Plan since 2018_M:DS 1 - 6 42 162
2

### Teaching Staff - Responsibilities

Teacher Responsibility
João Manuel Portela da Gama

### Teaching - Hours

 Theoretical and practical : 3,00
Type Teacher Classes Hour
Theoretical and practical Totals 1 3,00
João Manuel Portela da Gama 3,00
Last updated on 2021-02-12.

Fields changed: Lingua de trabalho

English

### Objectives

At the end of the semester students should be able to formulate decision problems from data flows.
Be able to apply methods / algorithms to a new problem of data flow analysis.
Be able to evaluate the results and understand the functioning of the methods studied.

### Learning outcomes and competences

Knowledge how to formulate a knowledge extraction problem from data flows.
Ability to apply methods / algorithms to new data flow analysis problems.
Evaluate the results and understand the functioning of the methods studied.

Presencial

### Pre-requirements (prior knowledge) and co-requirements (common knowledge)

basic knowledge of Data MiningI

### Program

S1- Data streams: definitions and methods
Problem formulation: basic methods and techniques.
Approximation and randomisation.
Illustrative problems and algorithms

S2 Tools for Data Stream Processing
MOA, SAMOA

S3- Clustering from data streams
Basic streaming methods for clustering
State-of-the art clustering algorithms
Clustering time-series

S4- Change detection
Problem definition. Basic methods for dealing with evolving data
Detection methods: CUSUM algorithms, SPC, ADWIN

S5- Learning decision trees from data streams
Incremental decision trees. Decision trees and change detection

S6- Ensemble models
Online Bagging and Boosting. Dynamic weighted majority algorithms

S7- Evaluation of stream learning algorithms
Evaluation metrics. Predictive sequencial approaches.

S8 Applications
Recommender Systems, Click streams, Social Media

S9- Novelty detection.
One-class classification, Novelty detection and open-set recognition
Cluster based methods for novelty detection.

S10- Ubiquitous data mining
Distributed clustering: two views.
Distributed clustering data
Distributed clustering data sources

S11-Evolving Networks.
Tracking evolving communities in large scale social networks

S12-Pattern mining
Problem definition.
Approximate algorithms for counting the frequency of items.
Approximate algorithms for counting the frequency of items sets.

### Mandatory literature

Gama João; Knowledge discovery from data streams. ISBN: 978-1-4398-2611-9
Albert Bifet, Ricard Gavalda; Machine Learning for Data Streams, MIT Press, 2017

### Teaching methods and learning activities

Theoretical-practical classes

### Software

Massive Online Analysis

### keywords

Physical sciences > Computer science > Cybernetics > Artificial intelligence

### Evaluation Type

Distributed evaluation without final exam

### Assessment Components

designation Weight (%)
Apresentação/discussão de um trabalho científico 40,00
Participação presencial 20,00
Trabalho escrito 40,00
Total: 100,00

### Amount of time allocated to each course unit

designation Time (hours)
Apresentação/discussão de um trabalho científico 0,00
Frequência das aulas 0,00
Trabalho de investigação 0,00
Total: 0,00

### Eligibility for exams

Positive in the two hoem works

### Calculation formula of final grade

Hw1 > 9.5 and HW2 > 9.5