Go to:
Logótipo
You are in:: Start > CC4073

Data Stream Mining

Code: CC4073     Acronym: CC4073     Level: 400

Keywords
Classification Keyword
OFICIAL Computer Science

Instance: 2024/2025 - 2S

Active? No
Responsible unit: Department of Computer Science
Course/CS Responsible: Master in Data Science

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
M:DS 0 Official Study Plan since 2018_M:DS 1 - 6 42 162

Teaching language

English

Objectives

At the end of the semester students should be able to formulate decision problems from data flows.
Be able to apply methods / algorithms to a new problem of data flow analysis.
Be able to evaluate the results and understand the functioning of the methods studied.

Learning outcomes and competences

Knowledge how to formulate a knowledge extraction problem from data flows.
Ability to apply methods / algorithms to new data flow analysis problems.
Evaluate the results and understand the functioning of the methods studied.

Working method

Presencial

Pre-requirements (prior knowledge) and co-requirements (common knowledge)

Basic knowledge of data mining.

Program

S1- Data streams: definitions and methods.
Problem formulation: basic methods and techniques.
Approximation and randomisation.
Illustrative problems and algorithms.

S2 - Tools for Data Stream Processing:
MOA, SAMOA, River, CapyMOA.

S3 - Clustering from data streams: basic streaming methods for clustering
State-of-the-art clustering algorithms.
Clustering time-series.


S4 - Change detection: problem definition. Basic methods for dealing with evolving data
Detection methods: CUSUM algorithms, SPC, ADWIN

S5- Learning decision trees from data streams
Incremental decision trees. Decision trees and change detection


S6 - Ensemble models: online Bagging and Boosting. Dynamic weighted majority algorithms

S7 - Evaluation of stream learning algorithms.
Evaluation metrics. Predictive sequential approaches.


S8 - Applications:
Recommender Systems, Click Streams, and Social Media.


S9 - Novelty detection.
One-class classification, Novelty detection and open-set recognition
Cluster-based methods for novelty detection.


S10- Ubiquitous data mining
Distributed clustering: two views.
Distributed clustering data
Distributed clustering data sources


S11 - Evolving Networks.
Tracking evolving communities in large-scale social networks

S12 - Pattern mining: problem definition.
Approximate algorithms for counting the frequency of items.
Approximate algorithms for counting the frequency of item sets.

Mandatory literature

Gama João; Knowledge discovery from data streams. ISBN: 978-1-4398-2611-9
Albert Bifet, Ricard Gavalda; Machine Learning for Data Streams, MIT Press, 2017

Teaching methods and learning activities


Theoretical-practical classes

Software

https://riverml.xyz/latest/
https://capymoa.org/

keywords

Physical sciences > Computer science > Cybernetics > Artificial intelligence

Evaluation Type

Distributed evaluation without final exam

Assessment Components

designation Weight (%)
Apresentação/discussão de um trabalho científico 40,00
Participação presencial 20,00
Trabalho escrito 40,00
Total: 100,00

Amount of time allocated to each course unit

designation Time (hours)
Apresentação/discussão de um trabalho científico 2,00
Frequência das aulas 42,00
Trabalho de investigação 78,00
Elaboração de projeto 40,00
Total: 162,00

Eligibility for exams

Obtaining approval for both assignments.

Calculation formula of final grade


The assignments should be performed in groups of 2 students.

Assign1 - grade of assignment 1.
Assign2 - grade of assignment 2.
Final - final grade.

If Assign1 > 9.5 and Assign2 > 9.5 Then
   Final = 0.5 * Assign1 + 0.5 * Assign2
Recommend this page Top
Copyright 1996-2025 © Faculdade de Ciências da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z  I Guest Book
Page created on: 2025-06-16 at 20:24:29 | Acceptable Use Policy | Data Protection Policy | Complaint Portal