Knowledge Extraction and Machine Learning
Keywords |
Classification |
Keyword |
OFICIAL |
Artificial Intelligence |
Instance: 2014/2015 - 1S
Cycles of Study/Courses
Acronym |
No. of Students |
Study Plan |
Curricular Years |
Credits UCN |
Credits ECTS |
Contact hours |
Total Time |
MIEIC |
22 |
Syllabus since 2009/2010 |
5 |
- |
6 |
56 |
162 |
Teaching language
Suitable for English-speaking students
Objectives
Background
After a season in which the different companies / institutions lot invested in data collection within the computerization of their operations, there is now the need to put this data in the service of these companies / institutions. The goal is to be able to extract knowledge from data, improving efficiency and gaining competitive advantage. It is this need that arises the Course (UC) Knowledge Extraction and Computational Learning (ECAC).
Objectives
- Motivate to the use of techniques of knowledge extraction (EC) data, or data mining in decision support.
- Develop the ability to properly utilize these techniques for automated analysis of large amounts of data.
Component distribution
- Scientific Component: 70%
- Technologycal Component: 30%
Learning outcomes and competences
Students should be able to
- Understand the different types of EC tasks.
- Identify decision support problems that can be represented as EC tasks.
- Understnad the phases of a EC project.
- Know the main methods / algorithms for each EC task type and understand the basics of their behavior.
- Apply these methods to decision support problems.
- Evaluate the results of a EC project.
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
- Although no particular UC in concrete is required, it is useful to have attended any UC on introduction to statistics;
- It is also important that the student has basic knowledge of algorithms.
Program
Descriptive Data Mining
- Introduction to knowledge extraction/data mining.
- Clustering: Partitional (review of K-means, K-medoids) and hierarchical algorithms. Other algorithms. Evaluation measures.
- Association Rules: Apriori algorithm. Other algorithms. Evaluation measures.
- Methodologies for Data Mining: The process of knowledge extraction. CRISP-DM. Project management.
- Pre-processing of data: Data cleansing and data transformation (normalization, reduction and discretization).
Predictive Data Mining
- Evaluation of predictive models: Review of decision trees. Overfitting in decision trees. Evaluation methodologies.
- Classification: Classification algorithms (rule-, instance- and kernel-based methods, Bayesian methods). Common Issues in classification (unbalanced distribution of classes and costs). Evaluation measures.
- Regression: Regression algorithms (linear and non-linear regression, regression trees, MARS). Evaluation measures.
Analysis of Complex data
- Text mining: Representation of data for text mining. Evaluation measures.
Mandatory literature
Jiawei Han, Micheline Kamber;
Data mining. ISBN: 1-55860-489-8
Complementary Bibliography
Ian H. Witten, Eibe Frank;
Data mining. ISBN: 1-55860-552-5
Peter Flach; Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, 2012. ISBN: 9781107422223 (http://www.cs.bris.ac.uk/~flach/mlbook/)
Mohammed Zaki and Wagner Meira Jr.; Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, 2013. ISBN: 9780521766333 (http://www.dcc.ufmg.br/miningalgorithms/DokuWiki/doku.php)
Max Kuhn, Kjell Johnson; Applied Predictive Modeling, Springer New York, 2013. ISBN: 9781461468493
Teaching methods and learning activities
- Theoretical presentation and discussion of the concepts.
- Laboratory sessions for practical application of the concepts learned.
Software
RapidMiner 5
The R Project for Statistical Computing
Evaluation Type
Distributed evaluation with final exam
Assessment Components
Designation |
Weight (%) |
Exame |
50,00 |
Participação presencial |
0,00 |
Trabalho laboratorial |
50,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
Designation |
Time (hours) |
Estudo autónomo |
60,00 |
Frequência das aulas |
42,00 |
Trabalho laboratorial |
60,00 |
Total: |
162,00 |
Eligibility for exams
The distributed evaluation consists of the development of a practical project. When a student misses a component of the distributed evaluation, the grade is assigned to 0 (zero) values. Students with Worker statute that do not go regularly to the classes should present regularly the evolution of their work, and should make their presentation, simultaneously with the ordinary students.
Calculation formula of final grade
0.5* Assignment Grade + 0.5* Exam Grade
Examinations or Special Assignments
The examination will be conducted without access to any materials.
The assignment will be carried out in groups of 2 students and consists in the analysis of a dataset and the preparation of a final presentation that describes and discusses the project and the corresponding results.
Special assessment (TE, DA, ...)
Students with worker statute or equivalent must take the exame and carry out the project.
Classification improvement
The improvement of the distributed classification can only be done in the following year.