Knowledge Extraction and Machine Learning
Keywords |
Classification |
Keyword |
OFICIAL |
Artificial Intelligence |
Instance: 2020/2021 - 1S
Cycles of Study/Courses
Acronym |
No. of Students |
Study Plan |
Curricular Years |
Credits UCN |
Credits ECTS |
Contact hours |
Total Time |
MIEIC |
65 |
Syllabus since 2009/2010 |
5 |
- |
6 |
42 |
162 |
Teaching language
Suitable for English-speaking students
Objectives
With the increasing digitization of their processes, organizations (companies, government, etc.) now feel the need to extract knowledge from this data to improve the efficiency and effectiveness of these processes (eg to gain competitive advantage). To this end, organizations need to acquire technical skills to develop solutions based on standard approaches to Machine Learning (ML) and Data Mining (DM), but also scientific skills for developing innovative solutions to problems where these standard approaches do not exist.
Thus, the goals of this course are:
- Motivate for the use of ML / DM techniques in decision support.
- Develop the ability to properly utilize these techniques for automatic analysis of large amounts of data.
- Develop the ability to undertake scientific research to develop new approaches to ML / DM.
Percentage Distribution
- Scientific component: 70%
- Technological component: 30%
Learning outcomes and competences
Students should be able to
- Understand the different types of Data Mining (DM) tasks.
- Identify decision support problems that can be represented as DM tasks.
- Be aware of the main methods / algorithms for the most common DM tasks and understand the basics of their operation.
- Apply these methods correctly when conducting a DM project, following a proper methodology.
- Appropriately evaluate the results of an DM project.
- Identify opportunities for developing new approaches to ML / DM.
- Develop simple but appropriate scientific work to create new approaches to ML / DM.
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
Although no particular course is required, it is useful to have basic knowledge of:
- statistics
- algorithms
- artificial intelligence and machine learning.
Program
- Introduction to Machine Learning and Data Mining.
- DM Projects: DM methodologies and data preparation.
- Classification: introduction, evaluation (measures and methodologies) and algorithms (rule-, distance- and kernel-based methods; Bayesian methods). Scoring with classification models: approach and evaluation. Common classification issues (unbalanced class distribution and costs).
- Regression: introduction, evaluation (measures; compromise between bias and variance) and algorithms.
- Clustering: Partition (revision of K-means, K -medoids), density and hierarchical algorithms. Evaluation measures.
- Frequent Pattern Discovery: Frequent item set algorithms (APRIORI, Eclat, FP-Growth) and association rules. Evaluation measures (Support, trust, lift, ...). Other types of patterns: sequences and graphs.
- Recommendation systems: introduction, evaluation (measures and methodologies) and algorithms (content based, collaborative filtering, specialized systems).
- Ensemble learning: methods (Bagging, Random Forests, AdaBoost, Negative Correlation Learning). Characteristics of a good ensemble.
- Incremental learning: introduction, evaluation (measures and methodologies) and algorithms (very fast decision trees, incremental clustering algorithms).
- Automated machine learning (autoML) and meta learning.
Mandatory literature
João Moreira, Andre Carvalho, Tomás Horvath;
Data Analytics: A General Introduction, Wiley, 2018. ISBN: 978-1-119-29626-3 (https://www.wiley.com/en-aw/A+General+Introduction+to+Data+Analytics-p-9781119296263)
Complementary Bibliography
Ian H. Witten, Eibe Frank;
Data mining. ISBN: 1-55860-552-5
Peter Flach;
Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, 2012. ISBN: 9781107422223 (http://www.cs.bris.ac.uk/~flach/mlbook/)
Mohammed Zaki and Wagner Meira Jr.;
Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, 2013. ISBN: 9780521766333 (http://www.dcc.ufmg.br/miningalgorithms/DokuWiki/doku.php)
Jiawei Han, Micheline Kamber;
Data mining. ISBN: 1-55860-489-8
Max Kuhn, Kjell Johnson;
Applied Predictive Modeling, Springer New York, 2013. ISBN: 9781461468493
Charu C. Aggarwal;
Data mining. ISBN: 978-3-319-14142-8
Teaching methods and learning activities
- Theoretical classes and individual study for exposition of concepts.
- Laboratory sessions and data mining project for practical application and consolidation of learned concepts.
- Research project and writing of scientific article for development of research skills.
Software
Python
Rapid Miner
The R Project for Statistical Computing
Evaluation Type
Distributed evaluation without final exam
Assessment Components
Designation |
Weight (%) |
Exame |
35,00 |
Participação presencial |
0,00 |
Trabalho laboratorial |
65,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
Designation |
Time (hours) |
Estudo autónomo |
60,00 |
Frequência das aulas |
42,00 |
Trabalho laboratorial |
60,00 |
Total: |
162,00 |
Eligibility for exams
The distributed assessment consists of:
- ECD project,
- mini test, and
- scientific project, including the writing of a scientific paper.
In case of missing one of the moments of the distributed evaluation, the respective grade is 0 (zero) values.
Students with worker or equivalent status, who are exempt from class attendance should, at regular intervals to be defined with the teachers, present the progress of their work, as well as do the scheduled presentations together with the regular students.
Calculation formula of final grade
0,3 * DM project + 0,35 * mini-test + 0,35 * research project
Minimum grade in each componente: 7,0 (out of 20)
Examinations or Special Assignments
The DM project will be developed in groups of 2 students and consists of analyzing a data set and preparing a presentation that describes and discusses both the project and the results obtained.
The scientific project will be elaborated individually or in groups with up to 3 students.
Special assessment (TE, DA, ...)
Students who are exempt from class attendance must complete all assessment components and should contact the teacher to make any necessary adjustments to the process.
Classification improvement
Grade improvement may be done for the mini-test and the scientific project in the special season (
recurso) of the year in which the student is approved.
For components which no grade improvement has been done in the year in which the student is approved, improvement may be made in one or more of the components in the following year, during the regular or special season.