Dynamic Programming and Learning for Decision and Control
Keywords |
Classification |
Keyword |
OFICIAL |
Automation and Control |
Instance: 2021/2022 - 1S
Cycles of Study/Courses
Acronym |
No. of Students |
Study Plan |
Curricular Years |
Credits UCN |
Credits ECTS |
Contact hours |
Total Time |
M.EEC |
14 |
Syllabus |
2 |
- |
6 |
39 |
|
Teaching language
Portuguese and english
Objectives
This UC aims to transpose the acquired bases in control,
optimization, dynamic systems (differential or with
discrete events), deterministic or stochastic to the
operational aspect in order to deal with the computational
complexity inherent to optimization and exploration
processes.
Learning outcomes and competences
Acquisition by students of fundamental knowledge for the
design and development of support systems for the management
and control of dynamic systems having as dynamic programming
as a central element, as well as the various approximating
approaches, generically called "reinforcement learning"
that promote different trade-offs between exploration and
optimization.
Part of the sub-objectives are, on the one hand, to
establish a link with previously offered curricular
subjects – essentially, dynamic systems, control,
optimization, systems with random variables, and Markov
chains – and, on the other hand, how to link with neural
networks as an efficient way to operationalize the
presented methods from a computational point of view.
Working method
Presencial
Pre-requirements (prior knowledge) and co-requirements (common knowledge)
Linear Algebra, Calculus, Signal Theory, Control Theory
Program
1. Introduction.
Clarification – through examples – how the contents of this UC allow to operationalize knowledge of previous CUs,
namely Control and Optimization and Discrete Events Systems
2. Review and complement of knowledge on Controlled Markov Chains.
Definition as stochastic automata timed. Transition Probabilities matrix. Transitional and permanent regimes.
Applications to control and optimize Queues. Markov's Decision Procedures.
3. Dynamic Programming
General basic concepts in discrete contexts and continuum time: cost-to-go function and principle of optimality.
Methods of solving the Hamilton-Jacobi-Bellman equation. Basic dynamic programming algorithms for discrete
problems. Example of the case of the Quadratic Linear problem. Relationship with the Principle of Maximus for this
case. Types of dynamic programming problems: Shorter stochastic path, and discounted cost.
4. Neuronal network architectures and training methods.
Architectures for approximation of the value function through multilevel neuronal networks. Training methods of
neuronal networks.
5. Iterative stochastic algorithms.
Basic model. Convergence based on smooth potential function. Convergence via contraction and monotony properties.
The approach of the common differential equation.
6. Simulation methods. Evaluation of policies by Monte Carlo simulation. Method of temporal differences. Iteration of
optimistic policies. Iteration of the value by simulation. Learning Q.
Mandatory literature
Bertsekas, D. P., & Tsitsikis, J. N.;
Neuro-Dynamic Programming, Athena Scientific, 1996
Bertsekas, D. P.;
Dynamic Programming and Optimal Control (3rd ed)., Athena Scientific, 2005
Cassandras, C.G., Lafortune, S.;
Introduction to Discrete Event Systems (2nd ed), Springer, 2008
Teaching methods and learning activities
Exposition classes: Presentation and discussion of the various topics of the curricular unit. Detailed explanation of examples of application of concepts and methods.
Exercises solving classes: Practical execises are solved by the students with the support of the teacher by clarifying the issues that they might raise. Follow-up of the work in the mini projects support by the use of OCTAVE/MATLAB.
Software
Octave, MATLAB
Matlab
keywords
Physical sciences > Mathematics > Applied mathematics
Technological sciences > Engineering > Electrical engineering
Technological sciences > Engineering > Systems engineering > Systems theory
Technological sciences > Engineering > Control engineering > Automation
Evaluation Type
Distributed evaluation with final exam
Assessment Components
Designation |
Weight (%) |
Exame |
70,00 |
Trabalho prático ou de projeto |
30,00 |
Total: |
100,00 |
Amount of time allocated to each course unit
Designation |
Time (hours) |
Elaboração de projeto |
20,00 |
Estudo autónomo |
103,00 |
Frequência das aulas |
39,00 |
Total: |
162,00 |
Eligibility for exams
Frequency is obtained through remote participation in at
least 75% of the PL classes and through participation in
the mini-project. assessment of PL Class participation
remotely is done by delivering an exercise from that class
solved until the following weekend.
In each PL class, 2 students will be drawn to make a
presentation of the exercise in question separately for
5 minutes for each one.
Calculation formula of final grade
The final evaluation has two components:
EF - Valuation of the Final Exam on a scale of 0 to
20 values with a weight of 70%
CC - Valuation of the Continuous Component on a scale
of 0 to 20 values with a weight of 30%
Final Classification = 0.7 EF + 0.3 CC
The Continuous Component is assessed by the performance
in the group project and the degree of participation of
the PL Lesson remotely through the delivery of an exercise
of that class solved until the following weekend.
In each PL class, 2 students will be drawn to make a
presentation of the exercise in question separately for
5 minutes for each one.
Remote participation in classes contributes to the final
assessment up to 2 points (10%).
Project performance will be valued up to 6 points (30%).
The sum of the evaluation of the project and of the
Continuous Component cannot exceed 6 values
The appeal exam has two objectives:
1. In case of non-approval, the appeal to the exam may
discard the continuous component or not, being selected
the alternative that most benefits the student.
2. In the case of approval, the appeal exam serves for
grade improvement.
Examinations or Special Assignments
Mini-project: design a control system using OCTAVE/MATLAB
Internship work/project
NA
Classification improvement
There are two options:
1. Conducting the resource exam valued up to 20 values
2. Conducting the resource exam valued up to 14 values,
in case it is more advantageous for the student to
account for the valuation of the continuous component.