Go to:
Logótipo
You are here: Start > MECD07

Big Data Engineering

Code: MECD07     Acronym: EGD

Keywords
Classification Keyword
CNAEF Informatics

Instance: 2020/2021 - 2S Ícone do Moodle

Active? Yes
Responsible unit: Department of Electrical and Computer Engineering
Course/CS Responsible: Master in Data Science and Engineering

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
MECD 24 Syllabus 1 - 6 42 162
Mais informaçõesLast updated on 2021-02-21.

Fields changed: Calculation formula of final grade, Tipo de avaliação, Componentes de Avaliação e Ocupação, Melhoria de classificação, Componentes de Avaliação e Ocupação, Tipo de avaliação

Teaching language

Suitable for English-speaking students

Objectives

Extracting information from large sets of data -- known as “big data” – has been the driver for several large and small companies in the last years and has imposed a specific set of challenges, that this course addresses. The student should be able to 1) distinguish the different theoretical concepts that support parallel and distributed computing including data processing; 2) understand how existing large data set storage and processing architectures and systems work; 3) acquire competences in developing and characterizing the performance of big data applications, namely data search and learning from data.

Learning outcomes and competences

We hope the discussion of the fundamental concepts in parallel programming, of the architectures and programming models, and of existing big data applications, by the means of scientific papers and active search by the students, may create in the students a sense of critical analysis of these concepts and the ability to use these concepts appropriately. We also hope that after the experience of developing a big data application using relevant and current technologies, the students can gain competences for further big data application development.

Working method

Presencial

Program


  1. Fundamental concepts of parallel computing: performance measurements, types of processors, memory management and data location, limitations of parallel computing, types of parallelism, stages in parallelization, parallel programming models, and data parallelism.


 



  1. Models for parallel programing with data: CUDA/GPU model, organization in threads and mapping to multi-dimensional data; Map-reduce model, key-value data organization, execution stages, speculative execution, relation with the Hadoop distributed file system and with resource management; Spark model, resilient and distributed datasets, variable broadcasting, streaming mode.


 



  1. Application development and performance characterization: search (Hadoop Pig, Spark SQL) and learning (Spark mmlib, deeplearning on GPU/tensorflow); debugging, measurements, and tunning of tasks, jobs, and stages in Spark, Hadoop, and tensorflow.

Mandatory literature

Tom White; Hadoop: The Definitive Guide -- Storage and Analysis at Internet Scale
Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell; Learning Spark -- Lightning-Fast Big Data Analysis
David B. Kirk, Wen-mei W. Hwu; Programming massively parallel processors

Teaching methods and learning activities

The teachning methodology is based on 1) discussion of the concepts of parallel programming with data, programming models and big data system architectures and applications, using scientific papers, case studies, and searching the Internet for information; 2) specification, development, test, and performance characterization of big data applications using the technologies and concepts discussed in the course. The students will be assessed with an exam that will check for the ability of the student to distinguish theoretical concepts in parallel computing and big data architectures and systems, and with a report on the development of the big data application; both components will have the same weight in the final score.

Evaluation Type

Distributed evaluation without final exam

Assessment Components

Designation Weight (%)
Exame 50,00
Trabalho laboratorial 50,00
Total: 100,00

Amount of time allocated to each course unit

Designation Time (hours)
Elaboração de projeto 60,00
Estudo autónomo 60,00
Frequência das aulas 42,00
Total: 162,00

Eligibility for exams

Developing the project and attending class.

Calculation formula of final grade


CF = 0,5*T + 0,5*TP; se ( T < 10,0 ou TP < 8,0 ) então CF =MIN(CF, 9.0)


T - Test
TP - Projet

Classification improvement

The classification of the Project can be improved in the next occurrence of the course. The test grade can be improved in re-sit one exam.

Recommend this page Top
Copyright 1996-2025 © Faculdade de Engenharia da Universidade do Porto  I Terms and Conditions  I Accessibility  I Index A-Z  I Guest Book
Page generated on: 2025-06-18 at 14:11:38 | Acceptable Use Policy | Data Protection Policy | Complaint Portal