Go to:
Logótipo
You are in:: Start > CC4064

Data Structures for Bioinformatics

Code: CC4064     Acronym: CC4064     Level: 400

Keywords
Classification Keyword
OFICIAL Computer Science

Instance: 2022/2023 - 2S Ícone do Moodle Ícone  do Teams

Active? Yes
Web Page: https://github.com/hpacheco/progii
Responsible unit: Department of Computer Science
Course/CS Responsible: Master in Bioinformatics and Computational Biology

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
E:BBC 1 PE_Bioinformatics and Computational Biology 1 - 6 42 162
M:BBC 17 The study plan since 2018 1 - 6 42 162

Teaching language

Suitable for English-speaking students

Objectives

The objective of this UC is to develop the ability to use a programming language (Python) as an instrument for scientific analysis. From this course it will be possible to develop complex programs and automate practical data exploration tasks, and offer an introduction to the extraction, processing, and visualization of data, in particular biomedical data of real use, i.e., with dimensionality, heterogeneity and volume not trivial. Special attention will be given to Python's data analysis ecosystem and its powerful libraries for data manipulation (e.g. numpy, pandas), scientific and statistical processing (e.g. scipy), visualization (e.g. matplotlib) or advanced data analysis using other domain-specific libraries.

Learning outcomes and competences

The student must be able to:

  • Confidently use basic Python data structures and advanced data structures that allow efficient manipulation of large volumes of data (e.g. pandas or numpy).
  • Program with the proper level of abstraction and encapsulation.
  • Produce correct, well-structured and well-documented code.
  • Extract and process data from different sources in different formats (e.g., textual, numerical, tabular or semi-structured), from generic formats to biomedical data formats.
• Use external libraries to visualize numerical and biomedical data (e.g. demographical data, gene expression data).

Working method

Presencial

Pre-requirements (prior knowledge) and co-requirements (common knowledge)

Some basic familiarity with the Python language and its development environment is preferential.

Program

Data types for collections: lists, tuples, dictionaries and sets. Definition of new types: notion of class, objects and methods. Code structuring using modules. Three basic programming principles: encapsulation, abstraction, and separation of concepts.

Introduction to the Python Scientific computing ecosystem.

Introduction to data creation, extraction and processing (e.g. using numpy). Reading data in different formats and converting into Python data structures. Manipulation and processing of data programmatically (e.g. using pandas). Introduction to Data Visualization and graphic generation (e.g. matplotlib). Visualization of numerical and biomedical data (e.g. boxplot generation to compare data distributions, heatmap visualization for genomic data).

Mandatory literature

Allen Downey; How to think like a computer scientist. ISBN: 0-9716775-0-6

Complementary Bibliography

Daniel Y. Chen; Pandas for Everyone, Addison-Wesley
Jake VanderPlas; Python data science handbook: Essential tools for working with data., O'Reilly Media, Inc
Wes McKinney; Python for data analysis: Data wrangling with Pandas, NumPy, and IPython, O'Reilly Media, Inc
Christian Hill; Learning Scientific Programming with Python 2nd Edition, Cambridge University Press, 2020. ISBN: 1108745911 (https://scipython.com/book2/)
Martin Jones; Biological data exploration with Python, pandas and seaborn: Clean, filter, reshape and visualize complex biological datasets using the scientific Python stack, Independently published, 2020 (https://pythonforbiologists.com/)

Teaching methods and learning activities

- Lectures, with examples of problem solving.

- Practical sessions in the laboratory.

- Homework.

Software

Pycharm Community Edition

keywords

Physical sciences > Computer science > Programming

Evaluation Type

Distributed evaluation without final exam

Assessment Components

designation Weight (%)
Trabalho prático ou de projeto 70,00
Prova oral 30,00
Total: 100,00

Amount of time allocated to each course unit

designation Time (hours)
Estudo autónomo 106,00
Frequência das aulas 56,00
Total: 162,00

Eligibility for exams

All that have submitted the practical projects are considered to have attended the course.

Calculation formula of final grade

Grading will the done via a practical project (70%), to be developed during the practical sessons and including additional homework tasks. An oral defense if the practical project is also planned (30%).

The final grade will be the sum of the grades of the practical project and the oral defense.

Special assessment (TE, DA, ...)

The students with special circumstances shall discuss their situation with the professor.

Classification improvement

The practical project may be improved and re-submitted during appeal season, and in that case new functionality will be required.
Recommend this page Top
Copyright 1996-2025 © Faculdade de Ciências da Universidade do Porto  I Terms and Conditions  I Acessibility  I Index A-Z  I Guest Book
Page created on: 2025-06-14 at 09:45:08 | Acceptable Use Policy | Data Protection Policy | Complaint Portal