Go to:
Logótipo
You are here: Start > MECD02

Introduction to machine learning and data mining

Code: MECD02     Acronym: IACEC

Keywords
Classification Keyword
CNAEF Informatics Sciences

Instance: 2022/2023 - 1S Ícone do Moodle

Active? Yes
Web Page: https://sigarra.up.pt/feup/en/UCURR_GERAL.FICHA_UC_VIEW?pv_ocorrencia_id=454762
Responsible unit: Department of Informatics Engineering
Course/CS Responsible: Master in Data Science and Engineering

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
MECD 35 Syllabus 1 - 6 42 162

Teaching Staff - Responsibilities

Teacher Responsibility
João Pedro Carvalho Leal Mendes Moreira

Teaching - Hours

Recitations: 3,00
Type Teacher Classes Hour
Recitations Totals 1 3,00
João Pedro Carvalho Leal Mendes Moreira 3,00

Teaching language

English

Objectives

Background:
After a season in which different companies/institutions very invested in data collection by computerizing its operations (e.g. sensors, GPS systems), and in which many and varied new data sources have emerged (e.g. social networks), there is now the need to place such data at the service of those companies. The goal is to be able to extract knowledge from these data in order to improve efficiency and gain competitive advantage. From this need arises the Curricular Unit  (UC) of Introduction to Machine Learning and Knowledge Extraction.

Objectives:
The student should be able to: (1) Use adequately descriptive statistics for data description; (2) To describe the different stages of the process of knowledge discovery CRISP; (3) to use and analyze the results of some of the main methods of classification and regression; (4) to use and interpret cluster analysis methods; (5) to use and interpret methods of association rules; (6) To be able to develop a project on Machine Learning or Knowledge Discovery using the CRISP-DM methodology.

Learning outcomes and competences

As a learning result, it is intended that students:

  • Understand the different types of machine learning & data mining (ML&DM) tasks.
  • Identify decision support problems that can be represented as ML&DM tasks.
  • Understand the phases of a ML&DM project.
  • Know the main methods / algorithms for each ML&DM task type and understand the basics of their behavior.
  • Apply these methods to decision support problems.
  • Evaluate the results of a ML&DM project.

Working method

Presencial

Pre-requirements (prior knowledge) and co-requirements (common knowledge)

It is not required to have attended any specific course. 

Program

Descriptive ML&DM

  • Descriptive data analysis techniques
  • Quality of data and preprocessing techniques: normalization, reduction and discretization.
  • Clustering: Partitioning, hierarchical, and densities algorithms. Evaluation measures.
  • Association rules: APRIORI algorithm. Other algorithms. Evaluation measures.
  • Data Mining Methodologies: CRISP-DM methodology. Project management.

Predictive ML&DM

  • Overfitting and resampling techniques.
  • Regression: Multivariate linear regression and extensions. Evaluation measures.
  • Selection criteria for techniques and models.
  • Binary classification: evaluation measures.
  • Predictive techniques based on distances, probabilities, search and optimization.
  • Ensemble learning
  • Algorithm Bias
  • Non-binary classification tasks
  • Classification with unbalanced data
  • Semi-supervised classification and active learning

Brief introduction to: Text mining, recommendation systems and analysis of social networks

Mandatory literature

Moreira, João; Carvalho, André de; Horvath, Tomás; A general introduction to data analytics, WIley, 2018. ISBN: 978-1-119-29626-3
Matthew North; Data mining for the masses, 2012. ISBN: 0615684378

Complementary Bibliography

Aggarwal Charu C.; Data mining. ISBN: 978-3-319-14142-8

Comments from the literature

Documents will be available to support the classes authored by the professor of the discipline.

Teaching methods and learning activities

Theoretical classes are based on the presentation of course unit themes followed by practical exercices.

Software

Python
Rapid Miner

Evaluation Type

Distributed evaluation without final exam

Assessment Components

Designation Weight (%)
Participação presencial 0,00
Teste 60,00
Trabalho laboratorial 40,00
Total: 100,00

Amount of time allocated to each course unit

Designation Time (hours)
Estudo autónomo 60,00
Frequência das aulas 42,00
Trabalho laboratorial 60,00
Total: 162,00

Eligibility for exams

N/A

Calculation formula of final grade

0.6*Test + 0.4*Assignment;
Minimum grades: Test >= 7.0.

Examinations or Special Assignments

The assignment is based on the execution of a group assignment (two people). The grade may be different to each element of the group.

Special assessment (TE, DA, ...)

Students taking exams under special regimes are expected to previously submit the project required for this course as ordinary students.Students not atteding the classes have to submit and present their work in the established deadlines. These later students should take the initiative to establish with the teatcher periodic meetings to report work progress.

Classification improvement

The classification improvement will be carried out by single individual proof with two components: 1. examination of appeal; 2. An additional component that allows assessing the skills assessed through the work developed in the distributed evaluation. The classification improvement can be made at the time of feature of this edition or subsequent editions. The improvement of final grade takes place at the corresponding appeal period in the current edition of the course or in the subsequent ones.

Recommend this page Top
Copyright 1996-2024 © Faculdade de Engenharia da Universidade do Porto  I Terms and Conditions  I Accessibility  I Index A-Z  I Guest Book
Page generated on: 2024-04-25 at 09:59:45 | Acceptable Use Policy | Data Protection Policy | Complaint Portal