Data Stream Mining
Keywords |
Classification |
Keyword |
OFICIAL |
Computer Science |
Instance: 2024/2025 - 2S
Cycles of Study/Courses
Teaching language
English
Objectives
At the end of the semester students should be able to formulate decision problems from data flows.
Be able to apply methods / algorithms to a new problem of data flow analysis.
Be able to evaluate the results and understand the functioning of the methods studied.
Learning outcomes and competences
Knowledge how to formulate a knowledge extraction problem from data flows.
Ability to apply methods / algorithms to new data flow analysis problems.
Evaluate the results and understand the functioning of the methods studied.
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
Basic knowledge of data mining.
Program
S1- Data streams: definitions and methods.
Problem formulation: basic methods and techniques.
Approximation and randomisation.
Illustrative problems and algorithms.
S2 - Tools for Data Stream Processing:
MOA, SAMOA, River, CapyMOA.
S3 - Clustering from data streams: basic streaming methods for clustering
State-of-the-art clustering algorithms.
Clustering time-series.
S4 - Change detection: problem definition. Basic methods for dealing with evolving data
Detection methods: CUSUM algorithms, SPC, ADWIN
S5- Learning decision trees from data streams
Incremental decision trees. Decision trees and change detection
S6 - Ensemble models: online Bagging and Boosting. Dynamic weighted majority algorithms
S7 - Evaluation of stream learning algorithms.
Evaluation metrics. Predictive sequential approaches.
S8 - Applications:
Recommender Systems, Click Streams, and Social Media.
S9 - Novelty detection.
One-class classification, Novelty detection and open-set recognition
Cluster-based methods for novelty detection.
S10- Ubiquitous data mining
Distributed clustering: two views.
Distributed clustering data
Distributed clustering data sources
S11 - Evolving Networks.
Tracking evolving communities in large-scale social networks
S12 - Pattern mining: problem definition.
Approximate algorithms for counting the frequency of items.
Approximate algorithms for counting the frequency of item sets.
Mandatory literature
Gama João;
Knowledge discovery from data streams. ISBN: 978-1-4398-2611-9
Albert Bifet, Ricard Gavalda; Machine Learning for Data Streams, MIT Press, 2017
Teaching methods and learning activities
Theoretical-practical classes
Software
https://riverml.xyz/latest/
https://capymoa.org/
keywords
Physical sciences > Computer science > Cybernetics > Artificial intelligence
Evaluation Type
Distributed evaluation without final exam
Assessment Components
designation |
Weight (%) |
Apresentação/discussão de um trabalho científico |
40,00 |
Participação presencial |
20,00 |
Trabalho escrito |
40,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
designation |
Time (hours) |
Apresentação/discussão de um trabalho científico |
2,00 |
Frequência das aulas |
42,00 |
Trabalho de investigação |
78,00 |
Elaboração de projeto |
40,00 |
Total: |
162,00 |
Eligibility for exams
Obtaining approval for both assignments.
Calculation formula of final grade
The assignments should be performed in groups of 2 students.
Assign1 - grade of assignment 1.
Assign2 - grade of assignment 2.
Final - final grade.
If Assign1 > 9.5 and Assign2 > 9.5 Then
Final = 0.5 * Assign1 + 0.5 * Assign2