Algorithms for Bioinformatics
Keywords |
Classification |
Keyword |
OFICIAL |
Computer Science |
Instance: 2022/2023 - 2S
Cycles of Study/Courses
Teaching language
Suitable for English-speaking students
Objectives
Bioinformatics is an interdisciplinary field that combines the fields of computer science, biology and biomedical science and statistics. Bioinformatics is devoted to the application and development of new computational methods for expanding the use of biological, biomedical or epidemiological data. Recent developments in high-throughput technologies have led to a real revolution in the biological and biomedical research with bioinformatics playing a central role in the analysis of massive amounts of data. This course will focus on the main algorithms developed to address Bioinformatics tasks. An emphasis will be made on algorithms for sequence processing and analysis, both on nucleic-acids or amino-acid sequences.
Our goal is that students will be able to understand how these algorithms work and how this can be developed and applied to address new computational tasks in biological sequence analysis.
Learning outcomes and competences
By the end of this course it is expected that the student:
- Is familiarized with the main concepts of Bioinformatics including the main concepts on Computational Molecular Biology;
- Identifies the main sources of biological sequence data (e.g. nucleotide or amino-acid sequences; motifs and domains) and associated types and how can they be represented from a computational point of view.
- Understands different problems related to sequence analysis and which are the most adequate algorithms and data structures to solve these problems.
- Identify the advantages and disadvantages of each method from an algorithmic point of view. Emphasis will be given to methods on basic sequence processing tasks, transcription and translation, pairwise and multiple sequence alignment, sequence pattern finding, phylogenetic analysis from sequences and graphs and biological networks.
- Have a perspective of Bioinformatics as a field of critical importance to leverage biological, biomedical and health research and as a field of constant and fast-paced development.
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
Programming in Python.
Program
- Overview of Molecular Biology Fundamental Concepts;
- Introduction to Python programming language;
- Basic Processing of Biological Sequence;
- Finding Patterns in Sequences;
- Pairwise Sequence Alignment;
- Searching Similar Sequences in Databases;
- Multiple Sequence Alignment;
- Clustering and Trees;
- Probabilistic Motifs;
- Graphs and Biological Networks;
Mandatory literature
Miguel Rocha and Pedro G. Ferreira; Bioinformatics Algorithms(1st Edition): Design and Implementation in Python., Elsevier, 2018. ISBN: 9780128125205 (https://www.elsevier.com/books/bioinformatics-algorithms/rocha/978-0-12-812520-5)
Complementary Bibliography
Neil C. Jones and Pavel A. Pevzner ; An Introduction to Bioinformatics Algorithms (Computational Molecular Biology) 1st Edition. . ISBN: 0262101068 (https://www.amazon.com/Introduction-Bioinformatics-Algorithms-Computational-Molecular/dp/0262101068)
Sebastian Bassi; Python for Bioinformatics, CRC Press, 2016
R. Durbin, S. Eddy, A. Krogh, G. Mitchison; iological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998
B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, P. Walter; Molecular Biology of the Cell, 4th edition, Garland Science, 2002
The official home of the Python Programming Language; The python language website (http://www.python.org/)
Philip Guo; Python tutor, visualization of code execution (http://pythontutor.com/)
Dan Gusfield; Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997
Python Software Foundation; The python tutorial (https://docs.python.org/3/tutorial/)
Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, David J. Lipman; Basic local alignment search tool, Journal of Molecular Biology 215 (3) (1990) 403–410.
Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, David J. Lipman; Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic Acids Research 25 (17) (1997) 3389–3402.
T.L. Bailey, C. Elkan; itting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings. International Conference on Intelligent Systems for Molecular Biology 2 (1994) 28–36.
Robert S. Boyer, J. Strother Moore; A fast string searching algorithm, Communications of the ACM 20 (10) (October 1977) 762–772.
Humberto Carrillo, David Lipman; The multiple sequence alignment problem in biology, SIAM Journal on Applied Mathematics 48 (5) (1988) 1073–1082.
M.K. Das, H.K. Dai; A survey of DNA motif finding algorithms, BMC Bioinformatics 8 (Suppl 7) (Nov 2007) S21.
Desmond G. Higgins, Paul M. Sharp; Clustal: a package for performing multiple sequence alignment on a microcomputer, Gene 73 (1) (1988) 237–244.
P. D’haeseleer; What are DNA sequence motifs? Nature Biotechnology 24 (4) (Apr 2006) 423–425.
Teaching methods and learning activities
- Theoretical classes: expository, accompanied by examples.
- Practical classes: implementation (in Python) of algorithms and practical assignments support.
Software
Python3
Evaluation Type
Distributed evaluation with final exam
Assessment Components
designation |
Weight (%) |
Exame |
60,00 |
Trabalho prático ou de projeto |
40,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
designation |
Time (hours) |
Estudo autónomo |
120,00 |
Frequência das aulas |
42,00 |
Total: |
162,00 |
Eligibility for exams
- Active attendance at least half of classes (unless an exception is granted).
- A minimum grade of 7 values is required in the exam and in the pratical assignments or project.
Calculation formula of final grade
Grading criteria:
The UC grade is determined based on satisfactory progress (
minimum grade of 7) in weekly assignments and final exam.
The breakdown of these components is as follows:
E: Final exam: 60%
P: Assignments: 40% (practical and theoretical assignments). To perform individually or in group. To be submitted during the week of the class.
The P component is calculated as follows:
P = media(Assignments) + Bonus[sum(Active_presence)/12 * 2]
Final Grade = P x 0.40 + 0.6 x EOnly component
E can be improved.
Special assessment (TE, DA, ...)
Students with special needs should discuss their situation with the responsible of the course.
Classification improvement
Only to the Exam part of the grade.
Observations
The exam will be realized in presential mode.
The UC materials are in moodle.
The materials will all be in English, including the exam statements. Classes will be taught in English only if justified. Students can participate/respond using Portuguese or English.