Go to:
Logótipo
You are in:: Start > M4063

Statistical Methods in Data Mining

Code: M4063     Acronym: M4063

Keywords
Classification Keyword
OFICIAL Mathematics

Instance: 2023/2024 - 1S Ícone do Moodle

Active? Yes
Web Page: http://moodle.up.pt/course/view.php?id=150
Responsible unit: Department of Mathematics
Course/CS Responsible: Master in Mathematical Engineering

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
M:A_ASTR 5 Study plan since academic year 2023/2024 1 - 6 48 162
2
M:CC 2 Study plan since 2014/2015 1 - 6 48 162
M:EGEO 0 Official Study Plan. 1 - 6 48 162
M:ENM 5 Official Study Plan since 2023/2024 1 - 6 48 162
2
M:M 0 Plano Oficial do ano letivo 2021 2 - 6 48 162

Teaching language

Portuguese

Objectives

Introduce the main concepts and methods of supervised and unsupervised classification.

Learning outcomes and competences

The student should be able to:

 - Recognize different problems of supervised and unsupervised classification solvable through the use of data mining methods discussed and with the use of R software.

- prepare, solve and present data mining computational projects where the various models introduced are discussed, validated and compared in real datasets.

- solve computational and non computational problems about the studied methodologies.

Working method

Presencial

Program

 

Introduction and exemplification of a supervised and an unsupervised classification problem Summary on random vectors. Multivariate normal distribution function. Principal component analysis. Clustering: hierarchical and non-hierarchical methods. Statistical decision theory. Linear and quadratic discriminant analysis. Logistic regression. Classification and regression trees; cost-complexity pruning. Refence to Random Forests, Bagging and Boosting. Neural networks. Non-parametric density estimation: Kernel and K-NN methods. Recent developments of kernel methods: support vector machines.

 

Mandatory literature

000040415. ISBN: 0-471-05669-3
000040365. ISBN: 0-387-95284-5
Hand David 1950-; Principles of data mining. ISBN: 9780262082907 hbk

Teaching methods and learning activities

The lessons are accompanied by materials provided by the teacher, including exercise sheets for each of the sections programmatic, and also the use of statistical software.

Software

Software R

Evaluation Type

Distributed evaluation with final exam

Assessment Components

designation Weight (%)
Exame 40,00
Trabalho escrito 60,00
Total: 100,00

Amount of time allocated to each course unit

designation Time (hours)
Estudo autónomo 120,00
Frequência das aulas 42,00
Total: 162,00

Eligibility for exams

At least 35% in each computational project.

Calculation formula of final grade

Final exam and projects. To be approved, the student must have a positive score on the final grade (exam and projects). The exam has a weight of 60% and the computational projects 60%. The student must have at least 35% of each component. Approval is subject to the value of Score_of_exams    being equal to or higher than 7.0 values (on a scale of 0 to 20).

The practical works consist of the analysis of a real database, using the methods taught, using software.
It should be done by groups of 2 students.

Classification improvement

Improvement of the final mark: students that  have succeed may attend the exam  (“época de recurso”) in order to improve their final mark. The mark obtained in the written assignment/project cannot be improved in any evaluation period. The evaluation formula is the same (see above).
Recommend this page Top
Copyright 1996-2024 © Faculdade de Ciências da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z  I Guest Book
Page created on: 2024-08-15 at 22:22:36 | Acceptable Use Policy | Data Protection Policy | Complaint Portal