Code: | M4114 | Acronym: | M4114 | Level: | 400 |
Keywords | |
---|---|
Classification | Keyword |
OFICIAL | Mathematics |
Active? | Yes |
Responsible unit: | Department of Mathematics |
Course/CS Responsible: | Master in Data Science |
Acronym | No. of Students | Study Plan | Curricular Years | Credits UCN | Credits ECTS | Contact hours | Total Time |
---|---|---|---|---|---|---|---|
M:DS | 32 | Official Study Plan since 2018_M:DS | 1 | - | 6 | 42 | 162 |
Teacher | Responsibility |
---|---|
Joaquim Fernando Pinto da Costa | |
Óscar António Louro Felgueiras |
Theoretical and practical : | 3,23 |
Type | Teacher | Classes | Hour |
---|---|---|---|
Theoretical and practical | Totals | 1 | 3,231 |
Óscar António Louro Felgueiras | 1,615 | ||
Joaquim Fernando Pinto da Costa | 1,615 |
Train students in multivariate data analysis methods in order to extract essential information from a potentially voluminous set of data with a focus on supervised and unsupervised learning methods.
1. Understanding the theoretical foundations of the methodologies taught.
2. Ability to extract essential information from a set of real data, using the methodologies taught
And in particular:
- Recognize different problems of multivariate data analysis and solve them using the methods addressed and using software R;
- Prepare, solve and present computational data mining projects, where the various models presented are discussed, evaluated and compared in concrete cases.
- Solve computational and non-computational exercises on the methodologies addressed
Previous knowledge on random variables, probability distribution, sample statistics, confidence intervals and hypothesis tests is required. Those are usual contents of an introductory course on Probability and Statistics for undergrduate students.
Exploratory (preliminar) data analysis
Factorial Analysis :
Principal Component Analysis;
Simple Correspondence Analysis;
Multiple Correspondence Analysis;
Multidimensional Scaling.
Cluster Analysis:
Comparison measures;
Hierarchical Clustering;
Non-Hierarchical Clustering,
Model-based Custering
Discriminant Analysis:
Discriminant Analysis in 2 groups;
Discriminant Analysis in K groups;
Decision Trees.
Classes will be simultaneously theoretical and practical, with several examples of application and making use of statistical software.
The used software will be the free programming language R.
designation | Weight (%) |
---|---|
Trabalho prático ou de projeto | 40,00 |
Exame | 60,00 |
Total: | 100,00 |
designation | Time (hours) |
---|---|
Estudo autónomo | 80,00 |
Frequência das aulas | 42,00 |
Apresentação/discussão de um trabalho científico | 2,00 |
Trabalho escrito | 38,00 |
Total: | 162,00 |
1. Evaluation will be distributed with a final examination. There is also an exam in the second evaluation period (“época de recurso”).
2. Grade Improvement: Students who want to improve their exam classification can attend the second exam ("época de recurso"). The work cannot be improved.
Final Score: 0.6* Score_of_exam + 0.4*Score_of_work.
The same formula applies also for the second exam (appeal season, "época de recurso"), and special seasons.
Practical work and its oral presentation are compulsory in the normal season, appeal season ("recurso") and special seasons.
Approval is subject to the value of Score_of_exam being equal to or higher than 7.5 values (on a scale of 0 to 20).
The practical work consists of the analysis of a real database, using the methods taught, using software.
It should be done by groups of 2 students.