Go to:
Logótipo
You are in:: Start > M4121

Statistical Methods in Data Mining

Code: M4121     Acronym: M4121

Keywords
Classification Keyword
CNAEF Mathematics and statistics

Instance: 2020/2021 - 2S Ícone do Moodle

Active? Yes
Web Page: http://moodle.up.pt/course/view.php?id=150
Responsible unit: Department of Mathematics
Course/CS Responsible: Computational Statistics and Data Analysis

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
E:ECAD 14 PE_Estatística Computacional e Análise de Dados 1 - 6 42 162

Teaching language

Portuguese

Objectives

Introduce the main concepts and methods of supervised and unsupervised classification.

Learning outcomes and competences

The student should be able to:

 - Recognize different problems of supervised and unsupervised classification solvable through the use of data mining methods discussed and with the use of R software.

- prepare, solve and present data mining computational projects where the various models introduced are discussed, validated and compared in real datasets.

- solve computational and non computational problems about the studied methodologies.

Working method

Presencial

Program

 

Introduction and exemplification of a supervised and an unsupervised classification problem Summary on random vectors. Multivariate normal distribution function. Principal component analysis. Clustering: hierarchical and non-hierarchical methods. Statistical decision theory. Linear and quadratic discriminant analysis. Logistic regression. Classification and regression trees; cost-complexity pruning. Neural networks. Non-parametric density estimation: Kernel and K-NN methods. Recent developments of kernel methods: support vector machines.

 

Mandatory literature

000040415. ISBN: 0-471-05669-3
000040365. ISBN: 0-387-95284-5
Hand David 1950-; Principles of data mining. ISBN: 9780262082907 hbk

Teaching methods and learning activities

The lessons are accompanied by materials provided by the teacher, including exercise sheets for each of the sections programmatic, and also the use of statistical software.

Software

Software R

Evaluation Type

Distributed evaluation with final exam

Assessment Components

designation Weight (%)
Exame 40,00
Trabalho escrito 60,00
Total: 100,00

Amount of time allocated to each course unit

designation Time (hours)
Estudo autónomo 120,00
Frequência das aulas 42,00
Total: 162,00

Eligibility for exams

At least 30% in each computational project.

Calculation formula of final grade

Final exam and projects. To be approved, the student must have a positive score on the final grade (exam and projects). The exam has a weight of 60% and the computational projects 60%. The student must have at least 30% of each component

Recommend this page Top
Copyright 1996-2024 © Faculdade de Ciências da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z  I Guest Book
Page created on: 2024-11-09 at 03:54:15 | Acceptable Use Policy | Data Protection Policy | Complaint Portal