| Code: | CC4072 | Acronym: | CC4072 | Level: | 400 |
| Keywords | |
|---|---|
| Classification | Keyword |
| OFICIAL | Computer Science |
| Active? | Yes |
| Web Page: | https://moodle2526.up.pt/course/view.php?id=7126 |
| Responsible unit: | Department of Computer Science |
| Course/CS Responsible: | Master in Computer Science |
| Acronym | No. of Students | Study Plan | Curricular Years | Credits UCN | Credits ECTS | Contact hours | Total Time |
|---|---|---|---|---|---|---|---|
| M:CC | 21 | Study plan since academic year 2025/2026 | 1 | - | 6 | 42 | 162 |
| M:ENSI | 4 | Official study plan since 2025/2026 | 1 | - | 6 | 42 | 162 |
| M:ERSI | 11 | Official Study Plan since 2021_M:ERSI | 1 | - | 6 | 42 | 162 |
| Teacher | Responsibility |
|---|---|
| Inês de Castro Dutra |
| Theoretical and practical : | 3,23 |
| Type | Teacher | Classes | Hour |
|---|---|---|---|
| Theoretical and practical | Totals | 1 | 3,231 |
| Inês de Castro Dutra | 3,231 |
This unit has as main objectives to provide an introduction to the main data science methodologies and also to convey knowledge on programming and tools for data processing and analysis, such as the Python language.
This unit should provide the students with:
1. theoretical competences on several basic methodologies of data science.
2. competences for developing software for data science tasks.
3. practical competences on applying data science techniques to specific problems.
1. Introduction to Data Science:
• the CRISP-DM model
• data, models and patterns
• data science tasks
2. Data Pre-Processing:
• importing data
• cleaning data
• transforming and creating variables
• dimensionality reduction techniques
3. Exploring and Visualizing Data
• data summarization
• data visualization
4. Descriptive Models
• clustering methods: partitional methods, hierarchical methods
• rule association
5. Predictive Models
• classification and regression tasks
• evaluation metrics
• linear regression, naive Bayes, k-nearest neighbours
• tree-based models: classification and regression trees, pruning methods
• neural networks and deep learning
• support vector machines
• ensembles: bagging, random forests, boosting, AdaBoost, Xgboost
6. Methodologies for Evaluating and Comparing Models
• evaluation measures
• estimation methods
• significance tests
The lectures are based on the oral exposition of the topics that are part of the syllabus, as well as illustrations with concrete data mining case studies.
| designation | Weight (%) |
|---|---|
| Exame | 70,00 |
| Apresentação/discussão de um trabalho científico | 30,00 |
| Total: | 100,00 |
| designation | Time (hours) |
|---|---|
| Estudo autónomo | 84,00 |
| Apresentação/discussão de um trabalho científico | 36,00 |
| Frequência das aulas | 42,00 |
| Total: | 162,00 |
The course evaluation consists of a final exam and a practical assignment at the end of the semester. The assignment will be evaluated through an oral presentation.
The final grade is calculated by averaging the practical and theoretical grades using the formula:
NF = 0.7 * Exam + 0.3 * Assignment
The practical assignment will be announced in the beginning of the semester and should be completed by the end of the semester.
The evaluation of the practical assignment is not subject to improvement.
The student can improve in the theoretical grade by taking the appeal (recurso) exam.
All of the provided material (e.g. slides, recommended books) is given in English and if there are foreign students the classes will also be given in English.
The material of the discipline will be made available in the corresponding Moodle webpage.