Foundations and Applications of Machine Learning
Keywords |
Classification |
Keyword |
OFICIAL |
Informatics |
Instance: 2020/2021 - 1S
Cycles of Study/Courses
Teaching language
Portuguese
Objectives
Modern Machine Learning (ML) techniques represent a powerful approach to the analysis of large-scale datasets, by deriving novel representations that augment domain knowledge and supporting informed decision-making processes.
The high demand for ML specialists to work in problems such as self-driving cars, DNA genome analysis or cancer prediction, climate change and many other fields prompts the need to train the next generation of computer scientists with the theoretical and practical knowledge of Machine Learning that allows them to develop projects that use the latest technologies following the best implementation practices.
In order to tackle this shortage and high demand of professionals with a solid background in ML and big data analytics, we propose a curricular unit that teaches Machine Learning using the state-of-the-art technologies, including the most recent software libraries and platforms.
Learning outcomes and competences
The main objectives are to supply the students with adequate knowledge and skills in the core principles and techniques of Machine Learning. Thus, students completing this unit should:
• Learn the fundamentals of ML – regression, classification, clustering, deep learning.
• Understand the connection between learning and optimization. Build optimal data representation models.
• Learn how to implement and apply predictive, classification, clustering, information retrieval and deep learning algorithms to real datasets.
• Develop a critical view and be able to choose, apply and evaluate the most adequate problem solving techniques in ML;
• Be able to design, specify, implement and validate advanced software tools for specific data analysis problems; assess the quality of the models using the relevant error metrics;
• Be able to interact with professionals from the domain field in the process of software development and generate the adequate reporting.
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
Programming and statistics.
Program
Brief program:
Fundamentals of ML. What is ML and what are the challenges?
Supervised versus Unsupervised ML
Data Pre-processing and exploration
Detection of outliers; Standardization; Transformation; Dimensionality reduction (Principal Component Analysis, Multi-Dimensional Scaling); Split data in train and test sets
Model evaluation and model selection methods
Cost (loss) function. Cost function convergence. Iterative gradient descent algorithm. Learning curves.
Performance measures (error, confusion matrix, sensitivity/specificity, ROC curves); Train and test paradigm; Cross validation; Bias and Variance; Overfitting. Regularization.
Classification methods
Decision Trees; K-NN; Linear and Nonlinear (Kernel); Support Vector Machines; Neural Networks; Logistic Regression, Ensemble approaches: Bagging, Boosting; Random Forests; Gradient Boosted Decision Trees
Regression
Univariate and Multivariable linear regression; Performance measures: RMSE, R-Squared; Results (Coefficients, residuals) interpretation. Batch/mini batch/stochastic gradient descent.
Unsupervised learning - fundamentals
Distance (similarity) measures
K-means clustering; Hierarchical clustering (different measures and methods); t-SNE.
Data compression
Deep Learning fundamentals
Shallow and Deep Neural networks; Network architectures (feed-forward, convolutional, recurrent, auto-encoders); Hyper-parameter optimization; Input data transformation; Applications to classification and regression problems
Information visualization
Scatter plot; Heatmaps; Trees and Dendrograms
Introduction to Scikit-learn, numpy, matplotlib, Pandas and Keras Python packages. Implementation and testing of ML pipelines. Create notebooks and docker containers for portability and predictability during development, testing, and deployment.
Mandatory literature
Sebastian Raschka; Python Machine Learning , Packt Publishing Ltd., 2019. ISBN: 1789955750
Tom Michael Mitchell;
Machine learning. ISBN: 978-0-07-042807-2
Trevor Hastie;
The elements of statistical learning. ISBN: 0-387-95284-5
Ian H. Witten;
Data mining. ISBN: 0-12-088407-0
Peter Flach;
Machine learning. ISBN: 978-1-107-42222-3
Teaching methods and learning activities
Exposition of theory and pratical applications.
Demo of programming code for the practical examples.
Software
Python3
Pandas
Scikit-Learn
Keras
keywords
Technological sciences > Engineering > Computer engineering
Evaluation Type
Distributed evaluation without final exam
Assessment Components
designation |
Weight (%) |
Trabalho prático ou de projeto |
70,00 |
Defesa pública de dissertação, de relatório de projeto ou estágio, ou de tese |
30,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
designation |
Time (hours) |
Apresentação/discussão de um trabalho científico |
4,00 |
Frequência das aulas |
28,00 |
Trabalho de investigação |
40,00 |
Total: |
72,00 |
Eligibility for exams
Participation in classes. Presentation and defense of pratical assignment project.
Calculation formula of final grade
F = 0,70*TP + 0,30*P
where TP = grade of practical assignment and P presentation of the work.
In order to successfully approve the course of FAML you will need to conclude an application project. To this end you will need to select a dataset, raise several analysis questions and apply the concepts introduced in the classes.
You will have to:
- i) write a report following the structure of a research paper.
- ii) elaborate and defend a presentation with the results of your analysis based on the previous report.
Classification improvement
After project presentation cannot improve final grade.