Fraud Detection
Keywords |
Classification |
Keyword |
OFICIAL |
Computer Science |
Instance: 2022/2023 - 2S 
Cycles of Study/Courses
Teaching language
Suitable for English-speaking students
Objectives
The objectives of this course are the study of data analysis methodologies that are useful in the context of the detection/forecasting of fraudulent cases. With the growing use of data collection methods in practically all human activities, the need for the use of techniques allowing the automatic analysis of such data with the objective of detection/predicting situations that could be considered anomalous or potentially fraudulent is increasing.
Learning outcomes and competences
It is intended that the students:
- Acquire theoretical knowledge of data analysis methodologies that are useful for the detection and prediction of fraud/anomalies;
- Acquire practical experience in developing and using software for the detection and prediction of fraud/anomalies;
- Acquire expertise in fraud detection by analysing practical case studies on this type of problem.
Working method
Presencial
Program
1) Introduction to Data Mining
- Data Mining applications and CRISP-DM methodology.
- A brief introduction to R programming language, data import and basic manipulation.
2) Data Understanding
- Data Summarization
- Data Visualization
3) Data Preparation
- Data Quality Issues
- Data Pre-processing
4) Unsupervised Learning
- Descriptive Analytics
- Clustering Algorithms and Validation Methods
5) Supervised Learning
- Classification and Regression problems.
- Binary and multiclass classification.
- Evaluation metrics
- Algorithms: k-NN, Naive Bayes, Linear Regression, Ridge and Lasso Regression, CART, SVMs, ANNs.
6) Ensembles
- Motivation and Types of Ensembles
- Algorithms: Bagging, Random Forest, Boosting, AdaBoost, XGBoost.
7) Evaluation Methodologies
- Performance estimation and experimental methodologies.
- Comparison of Models: statistical significance, paired comparisons on single and multiple tasks.
8) Imbalanced Domain Learning and Anomaly Detection
- Challenges
- Approaches
- Open Research Questions
Mandatory literature
Barnett Vic; Outliers in statistical data. ISBN: 0-471-99599-1
Torgo Luís; Data Mining with R. ISBN: 9781439810187 hbk
Complementary Bibliography
Han,J.; Kamber,M and Pei,J.; Data Mining: concepts and techniques (3rd edition)
Teaching methods and learning activities
Classes will combine theory and practice, with exposition of theory complemented with practical exercices on the computer.
Software
R statistical software
Evaluation Type
Distributed evaluation without final exam
Assessment Components
designation |
Weight (%) |
Teste |
40,00 |
Trabalho prático ou de projeto |
60,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
designation |
Time (hours) |
Elaboração de projeto |
0,00 |
Estudo autónomo |
0,00 |
Frequência das aulas |
0,00 |
Total: |
0,00 |
Eligibility for exams
It is required that you obtain a minimum score of 7 in the theoretical test.
Calculation formula of final grade
The following formula gives the final classification:
NF = 0.4 * NT + 0.6 * NP
where NT is the grade of the theoretical test, and NP is given by the individual project grade.
Internship work/project
The practical assignment is individual and consists of the development and presentation of a project aimed at detecting fraud on a set of real data