Data Structures for Bioinformatics
Keywords |
Classification |
Keyword |
OFICIAL |
Computer Science |
Instance: 2022/2023 - 2S

Cycles of Study/Courses
Teaching language
Suitable for English-speaking students
Objectives
The objective of this UC is to develop the ability to use a programming language (Python) as an instrument for scientific analysis. From this course it will be possible to develop complex programs and automate practical data exploration tasks, and offer an introduction to the extraction, processing, and visualization of data, in particular biomedical data of real use, i.e., with dimensionality, heterogeneity and volume not trivial. Special attention will be given to Python's data analysis ecosystem and its powerful libraries for data manipulation (e.g. numpy, pandas), scientific and statistical processing (e.g. scipy), visualization (e.g. matplotlib) or advanced data analysis using other domain-specific libraries.
Learning outcomes and competences
The student must be able to:
- Confidently use basic Python data structures and advanced data structures that allow efficient manipulation of large volumes of data (e.g. pandas or numpy).
- Program with the proper level of abstraction and encapsulation.
- Produce correct, well-structured and well-documented code.
- Extract and process data from different sources in different formats (e.g., textual, numerical, tabular or semi-structured), from generic formats to biomedical data formats.
• Use external libraries to visualize numerical and biomedical data (e.g. demographical data, gene expression data).
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
Some basic familiarity with the Python language and its development environment is preferential.
Program
Data types for collections: lists, tuples, dictionaries and sets. Definition of new types: notion of class, objects and methods. Code structuring using modules. Three basic programming principles: encapsulation, abstraction, and separation of concepts.
Introduction to the Python Scientific computing ecosystem.
Introduction to data creation, extraction and processing (e.g. using numpy). Reading data in different formats and converting into Python data structures. Manipulation and processing of data programmatically (e.g. using pandas). Introduction to Data Visualization and graphic generation (e.g. matplotlib). Visualization of numerical and biomedical data (e.g. boxplot generation to compare data distributions, heatmap visualization for genomic data).
Mandatory literature
Allen Downey;
How to think like a computer scientist. ISBN: 0-9716775-0-6
Complementary Bibliography
Daniel Y. Chen; Pandas for Everyone, Addison-Wesley
Jake VanderPlas; Python data science handbook: Essential tools for working with data., O'Reilly Media, Inc
Wes McKinney; Python for data analysis: Data wrangling with Pandas, NumPy, and IPython, O'Reilly Media, Inc
Christian Hill; Learning Scientific Programming with Python 2nd Edition, Cambridge University Press, 2020. ISBN: 1108745911 (https://scipython.com/book2/)
Martin Jones; Biological data exploration with Python, pandas and seaborn: Clean, filter, reshape and visualize complex biological datasets using the scientific Python stack, Independently published, 2020 (https://pythonforbiologists.com/)
Teaching methods and learning activities
- Lectures, with examples of problem solving.
- Practical sessions in the laboratory.
- Homework.
Software
Pycharm Community Edition
keywords
Physical sciences > Computer science > Programming
Evaluation Type
Distributed evaluation without final exam
Assessment Components
designation |
Weight (%) |
Trabalho prático ou de projeto |
70,00 |
Prova oral |
30,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
designation |
Time (hours) |
Estudo autónomo |
106,00 |
Frequência das aulas |
56,00 |
Total: |
162,00 |
Eligibility for exams
All that have submitted the practical projects are considered to have attended the course.
Calculation formula of final grade
Grading will the done via a practical project (70%), to be developed during the practical sessons and including additional homework tasks. An oral defense if the practical project is also planned (30%).
The final grade will be the sum of the grades of the practical project and the oral defense.
Special assessment (TE, DA, ...)
The students with special circumstances shall discuss their situation with the professor.
Classification improvement
The practical project may be improved and re-submitted during appeal season, and in that case new functionality will be required.