Modelling and Data Analysis II
Keywords |
Classification |
Keyword |
OFICIAL |
Management Studies |
Instance: 2022/2023 - 1S 
Cycles of Study/Courses
Teaching language
English
Objectives
The course aims to develop the skills to define and use data mining projects.
Learning outcomes and competences
The definition of a data mining project requires: knowing the different data mining tasks, knowing the different methods and algorithms for each task, understanding how the methods work, being able to apply these methods to new data mining problems, being able to evaluate and interpret the results.
Working method
Presencial
Program
- Introduction to data mining
- Knowledge, generalization and specialization. Knowledge representation;
- Data mining tasks;
- Tools for data mining.
- Exploratory Data Analysis
- Collecting;
- Initial Data Exploration;
- Cleaning;
- Data visualization;
- Attributes selection;
- Extreme values (outliers);
- Missing values;
- Exploratory data analysis;
- Graphical Visualization;
- Predictive modelling
- Distance based methods: k -NN;
- Probabilistic methods: Bayesian classifiers;
- Search based methods: decision trees, rules;
- Optimization methods: SVM, ANN;
- Evaluation of classification and regression methods: metrics, costs. ROC analysis;
- Multiple Models;
- Pre-processing, outliers, missing values, discretization.
- Descriptive Modelling
- Cluster analysis,Frequent patterns, Association analysis;
- Groups analysis.
- Conclusing remarks
- Data mining Process Methodologies;
- Data Mining and Ethics.
Mandatory literature
Ian H. Witten;
Data mining. ISBN: 1-55860-552-5
Jiawei Han;
Data mining. ISBN: 978-0-12-381479-1
Complementary Bibliography
João Manuel Portela da Gama;
Extração de conhecimento de dados. ISBN: 978-972-618-914-5
Teaching methods and learning activities
The course is organized in lab sessions, based on modules. The teaching methodology in each module is structured as follows:
- description of the financial problem to solve;
- identification with explanation of the appropriate computacional methods for their resolution;
- exercises (sedimentation and knowledge exploitation).
Software
Jupyter
Python
R
Evaluation Type
Distributed evaluation without final exam
Assessment Components
Designation |
Weight (%) |
Apresentação/discussão de um trabalho científico |
20,00 |
Teste |
30,00 |
Trabalho prático ou de projeto |
50,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
Designation |
Time (hours) |
Estudo autónomo |
60,00 |
Frequência das aulas |
42,00 |
Trabalho escrito |
20,00 |
Trabalho laboratorial |
40,00 |
Total: |
162,00 |
Eligibility for exams
According to the General Regulation for the Assessment of First Degreeand Master’s Degree students at the School of Economics and Management of the University of Porto all students enrolled in a course unit fulfill attendance requirements. (article 10th point 5)
Calculation formula of final grade
30% individual assessment + 70% group work (2 works with the same weight)