Fraud Detection

Code:

CC4036

Acronym:

CC4036

Level:

400

Keywords
Classification	Keyword
OFICIAL	Computer Science

Instance: 2022/2023 - 2S

Active?	Yes
Responsible unit:	Department of Computer Science
Course/CS Responsible:	Master in Information Security

Cycles of Study/Courses

Acronym	No. of Students	Study Plan	Curricular Years	Credits UCN	Credits ECTS	Contact hours	Total Time
M:SI	20	Study plan since 2020/2021	1	-	6	42	162

Last updated on 2023-04-26.

Fields changed: Program, Fórmula de cálculo da classificação final

Teaching language

Suitable for English-speaking students

Objectives

The objectives of this course are the study of data analysis methodologies that are useful in the context of the detection/forecasting of fraudulent cases. With the growing use of data collection methods in practically all human activities, the need for the use of techniques allowing the automatic analysis of such data with the objective of detection/predicting situations that could be considered anomalous or potentially fraudulent is increasing.

Learning outcomes and competences

It is intended that the students:

Acquire theoretical knowledge of data analysis methodologies that are useful for the detection and prediction of fraud/anomalies;

Acquire practical experience in developing and using software for the detection and prediction of fraud/anomalies;

Acquire expertise in fraud detection by analysing practical case studies on this type of problem.

Working method

Presencial

Program

1) Introduction to Data Mining
- Data Mining applications and CRISP-DM methodology.
- A brief introduction to R programming language, data import and basic manipulation.

2) Data Understanding
- Data Summarization
- Data Visualization

3) Data Preparation
- Data Quality Issues
- Data Pre-processing

4) Unsupervised Learning
- Descriptive Analytics
- Clustering Algorithms and Validation Methods

5) Supervised Learning
- Classification and Regression problems.
- Binary and multiclass classification.
- Evaluation metrics
- Algorithms: k-NN, Naive Bayes, Linear Regression, Ridge and Lasso Regression, CART, SVMs, ANNs.

6) Ensembles
- Motivation and Types of Ensembles
- Algorithms: Bagging, Random Forest, Boosting, AdaBoost, XGBoost.

7) Evaluation Methodologies
- Performance estimation and experimental methodologies.
- Comparison of Models: statistical significance, paired comparisons on single and multiple tasks.

8) Imbalanced Domain Learning and Anomaly Detection
- Challenges
- Approaches
- Open Research Questions

Mandatory literature

Barnett Vic; Outliers in statistical data. ISBN: 0-471-99599-1
Torgo Luís; Data Mining with R. ISBN: 9781439810187 hbk

Complementary Bibliography

Han,J.; Kamber,M and Pei,J.; Data Mining: concepts and techniques (3rd edition)

Teaching methods and learning activities

Classes will combine theory and practice, with exposition of theory complemented with practical exercices on the computer.

Software

R statistical software

Evaluation Type

Distributed evaluation without final exam

Assessment Components

designation	Weight (%)
Teste	40,00
Trabalho prático ou de projeto	60,00
Total:	100,00

Amount of time allocated to each course unit

designation	Time (hours)
Elaboração de projeto	0,00
Estudo autónomo	0,00
Frequência das aulas	0,00
Total:	0,00

Eligibility for exams

It is required that you obtain a minimum score of 7 in the theoretical test.

Calculation formula of final grade

The following formula gives the final classification:

NF = 0.4 * NT + 0.6 * NP

where NT is the grade of the theoretical test, and NP is given by the individual project grade.

Internship work/project

The practical assignment is individual and consists of the development and presentation of a project aimed at detecting fraud on a set of real data

Recommend this page Top

Copyright 1996-2025 © Faculdade de Ciências da Universidade do Porto I Terms and Conditions I Acessibility I Index A-Z I Guest Book
Page created on: 2025-06-16 at 02:59:35 | Acceptable Use Policy | Data Protection Policy | Complaint Portal