Advanced Topics in Data Science
| Keywords |
| Classification |
Keyword |
| OFICIAL |
Computer Science |
Instance: 2025/2026 - 2S 
Cycles of Study/Courses
Teaching Staff - Responsibilities
Teaching language
Portuguese and english
Obs.: All course material are provided in English
Objectives
Identification and application of data science techniques for knowledge extraction from diverse data sources with a focus on NLP and Information Retrieval. We will see how to handle and explore text (natural language processing), interaction data (recommendation systems and association rules), sequences (sequence mining), and networks in a web and social media context (link analysis). We will also handle outlier detection and its application in this context.
Learning outcomes and competences
At the end of the course, the student should be able to:
- recognize different problems solvable through the use of mentioned techniques;
- identify and specify tasks similar to those discussed;
- obtain and pre-process data for the algorithms and tasks addressed;
- understand and use the algorithms;
- obtain, interpret, evaluate and use models;
- Implement some of the algorithms and propose changes to improve them.
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
The student should be familiar with the basic concepts of data science and computational learning and have knowledge of programming languages used in data mining tasks, such as the Python language.
Program
1. Natural Language Processing:
• Text representation
• Preprocessing
• NLP tasks
• Classical and deep learning approaches
• NLP applications
2. Web:
• Information retrieval
• Recommendation systems: collaborative filtering, matrix factorization, and deep learning approaches
• Link analysis
3. Frequent pattern extraction:
• Frequent itemsets and association rules
• Apriori and FP-Growth algorithms
• Itemset summarization and rule selection
• Deep learning approaches
4. Rare value discovery:
• Challenges
• Unsupervised techniques
• Semi-supervised techniques
• Applications in NLP, IR, and web
Mandatory literature
Daniel Jurafsky & James H. Martin; Speech and Language Processing, Prentice Hall / Pearson, 2025 (https://web.stanford.edu/~jurafsky/slp3/ (3rd edition))
Emrul Hasan, Mizanur Rahman, Chen Ding, Jimmy Xiangji Huang, and Shaina Raza; Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives, ACM, 2025 (https://doi.org/10.1145/3742421)
Petru Kallay, Tudor Dan Mihoc ; Comparative Analysis of Frequent Pattern Mining Algorithms, 2025 (https://link.springer.com/article/10.1007/s44427-025-00008-1)
Teaching methods and learning activities
Theoretical-practical classes where the topics covered in the program will be exposed and some practical examples of application will be provided. Solving exercises in the practical part and carrying out group work with final presentation and discussion of the results.
Software
R
RStudio
Python
Jupyter lab
Evaluation Type
Distributed evaluation with final exam
Assessment Components
| designation |
Weight (%) |
| Trabalho prático ou de projeto |
40,00 |
| Exame |
50,00 |
| Teste |
10,00 |
| Total: |
100,00 |
Amount of time allocated to each course unit
| designation |
Time (hours) |
| Elaboração de projeto |
35,00 |
| Estudo autónomo |
84,00 |
| Apresentação/discussão de um trabalho científico |
1,00 |
| Frequência das aulas |
42,00 |
| Total: |
162,00 |
Eligibility for exams
Practical work is
mandatory for all scheduled assignments.
At least 70% attendance is required for both theoretical and practical laboratory classes.
Calculation formula of final grade
The course assessment is distributed, consisting of a test, a final exam, and a practical assignment.
The combined grade is calculated by weighting the practical and theoretical grades using the formula:
NComb = 0.50 * NE + 0.1 * NT + 0.40 * NTP
where,
NE is the grade obtained in the exam and NTP is the grade of the practical assignment.
The final grade (NF) is limited to 30% above the individual grade (test plus exam).
NF = min(1,3*NInd,NComb)
If the exam grade is higher than the test grade, or if the student did not take the test for justified reasons, the exam will have a weight of 60% and the test will not be considered.
Students who do not obtain a minimum of 30% in each component (except the test) will not pass.
The resit exam will be graded for 60% (12 out of 20) of the final grade or in a combined grade with the test in the same proportions as in the regular exam period.
Examinations or Special Assignments
The practical assignment will be announced in the middle of the semester and should be completed and presented by the end of the semester.
Special assessment (TE, DA, ...)
The student can improve only the theoretical grade by taking the supplementary exam.
The requirement for minimum attendance in classes does not apply.
Classification improvement
The evaluation of the practical assignment is not subject to improvement.
The student can improve the theoretical grade by taking the supplementary exam.
Observations
All the provided material (slides, recommended books, assignments and exames, etc.) is in the English language.