Knowledge Extraction and Machine Learning
Keywords |
Classification |
Keyword |
OFICIAL |
Artificial Intelligence |
Instance: 2010/2011 - 1S
Cycles of Study/Courses
Acronym |
No. of Students |
Study Plan |
Curricular Years |
Credits UCN |
Credits ECTS |
Contact hours |
Total Time |
MIEIC |
20 |
Syllabus since 2009/2010 |
5 |
- |
6 |
56 |
162 |
Teaching language
Portuguese
Objectives
To provide the students with knowledge so that they can use analysis and extraction techniques of large data quantities’ patterns.
Program
Introduction to the knowledge extraction: Concept of Data Mining; Data Mining and Knowledge Discovery process.
Data preparation: Data cleaning; Data Normalization, Reduction and Discretization.
Association Rules: Definition of the association rules research problem. Quality measures of the association rules. Some research algorithms of association rules.
Clustering: Clustering Techniques. Partition clustering algorithms (K-means, K-medoids) and Hierarchical clustering quality. Other algorithms: BIRCH, CURE, DBSCAN.
Web Mining: Data Mining concepts on the Web; Information research on the Web; Research usage patterns on the Web; Structure analysis and research on the Web.
Classification: Classification techniques for the analysis of large data quantities; Decision Trees;
Classification and Regression Trees (CART); Pruning principles; Bayesian Classification. Inductive Logic Programming.
Relational Data Mining using Inductive Logic Programming.
PKDD: Parallel Knowledge Discovery in Databases – Parallel Processing Techniques for the extraction of patterns in large data quantities.
KDD Applications.
Mandatory literature
Han, Jiawei;
Data mining. ISBN: 1-55860-489-8
Complementary Bibliography
Ian H. Witten abd Eibe Frank; Data Mining, Practical Machine Learning Tools and Techniques, Elsevier, 2005. ISBN: 0120884070
Teaching methods and learning activities
Theoretical classes: Exposition of theoretical concepts.
Practical classes: Exercise resolution, discussion of themes presented in the theoretical classes and help on the practical assignments.
Software
Weka 3: Data Mining Software in Java
The R Project for Statistical Computing
SPSS 17.0
RapidMiner 5
Evaluation Type
Distributed evaluation with final exam
Assessment Components
Description |
Type |
Time (hours) |
Weight (%) |
End date |
Attendance (estimated) |
Participação presencial |
39,00 |
|
|
Project |
Trabalho escrito |
60,00 |
|
|
Final Exam |
Exame |
3,00 |
|
|
|
Total: |
- |
0,00 |
|
Amount of time allocated to each course unit
Description |
Type |
Time (hours) |
End date |
Studying |
Estudo autónomo |
60 |
|
|
Total: |
60,00 |
|
Eligibility for exams
The average grade of the distributed evaluation component must be equal or superior to 6 marks.
Calculation formula of final grade
0.5* Assignment Grade + 0.5* Exam Grade
Examinations or Special Assignments
The assignment consists in the analysis of a dataset using the techniques learned.
It is required the preparation of a progress report and a final report.
In the last class of the term students have to present their work.
10% of the course final mark is relative to the progress report and 40% to the final report and presentation.
Special assessment (TE, DA, ...)
The students dismissed from the practical classes must do the practical assignment and the final exam.
Classification improvement
The distributed classification improvement can only be done in the following year.