Go to:
Logótipo
You are in:: Start > EC_UFC031

Introduction to data processing with Python and Spark

Code: EC_UFC031     Acronym: EC_UFC031

Keywords
Classification Keyword
CNAEF Informatics

Instance: 2023/2024 - SP (of 17-05-2024 to 08-06-2024) Ícone do Moodle

Active? Yes
Web Page: https://www.dcc.fc.up.pt/~miguel-areias/teaching/2324/ps-ed1/index.html
Responsible unit: Department of Computer Science
Course/CS Responsible: Introduction to data processing with Python and Spark

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
UFC:Spar 24 PE_Introduction to data processing with Python and Spark 1 - 3 21 81

Teaching Staff - Responsibilities

Teacher Responsibility
Miguel João Gonçalves Areias

Teaching - Hours

Theoretical and practical : 1,50
Type Teacher Classes Hour
Theoretical and practical Totals 1 1,50
Miguel João Gonçalves Areias 1,50
Vítor Daniel Peixoto de Sousa 1,50

Teaching language

Portuguese

Objectives

The course is designed for people with basic Python programming knowledge who aim to develop skills in analyzing large volumes of data.

Learning outcomes and competences

At the end of the course, trainees should have acquired programming knowledge in Apache Spark and should be able to implement data analysis algorithms, namely:

- Know the MapReduce programming model and build basic programs using transformations and actions;
- Know the RDD and dataframe models;
- Process structured data with SparkSQL;
- Reading and writing data files.

Working method

Presencial

Program

- Introduction to the MapReduce programming model;
- The HDFS storage model and RDD representation;
- Actions and transformations;
- Processing with Key-Value pairs;
- Definition of Lambda functions;
- Understand the processing flow with DAGs;
- Configuration of the parallelism level;
- Introduction to SparkSQL;
- Reading and writing structured files;
- Work with missing and incorrect data;
- Structured operations.

Mandatory literature

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee; Learning Spark -- Lightning-Fast Big Data Analysis, O'Reilly, 2020. ISBN: 978-1492050049

Teaching methods and learning activities

The course operates in-person mode and comprises 21 hours of theoretical-practical contact.
The Theoretical-Practical sessions will be supported by the projection of content and the provision of dedicated notes.

During the presentation classes, and using examples, small programs will be developed in PySpark in an interactive way. Theoretical-practical classes take place in a laboratory with computers, and encourage trainees to solve small sets of various problems, with support from the trainer.

Evaluation Type

Distributed evaluation with final exam

Assessment Components

designation Weight (%)
Exame 80,00
Trabalho laboratorial 20,00
Total: 100,00

Amount of time allocated to each course unit

designation Time (hours)
Estudo autónomo 60,00
Frequência das aulas 21,00
Total: 81,00

Eligibility for exams

no requirements

Calculation formula of final grade

NCP - Grade of the Pratical Component

NE - Grade of the Exam

Final Grade = 0.2 x NCP  + 0.8 x NE (between 0 and 20)
Recommend this page Top
Copyright 1996-2024 © Faculdade de Ciências da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z  I Guest Book
Page created on: 2024-07-16 at 23:59:13 | Acceptable Use Policy | Data Protection Policy | Complaint Portal