Resumo (PT):
The objective of this work is to determine how/if learning agents can benefit from exchanging information during learning in problems where each team uses a different learning algorithm. In recent studies several problems were exposed, such as lack of coordination, exchange of useless information and difficulties in the adequate choice of advisors. In this work we propose new solutions and test them in two different domains (predator-prey and traffic-control). Our solutions involve hybrid algorithms derived from Q-Learning and Evolutionary Algorithms. Results indicate that some combinations of learning algorithms are more suited to the use of external information than others and that the difference in the results achieved, with and without communication, is problem dependent. The results also show that, in situations where communication is useful, the gain in quality and learning-time can be significant if the right combination of techniques is used to process external information.
Abstract (EN):
The objective of this work is to determine how/if learning agents can benefit from exchanging information during learning in problems where each team uses a different learning algorithm. In recent studies several problems were exposed, such as lack of coordination, exchange of useless information and difficulties in the adequate choice of advisors. In this work we propose new solutions and test them in two different domains (predator-prey and traffic-control). Our solutions involve hybrid algorithms derived from Q-Learning and Evolutionary Algorithms. Results indicate that some combinations of learning algorithms are more suited to the use of external information than others and that the difference in the results achieved, with and without communication, is problem dependent. The results also show that, in situations where communication is useful, the gain in quality and learning-time can be significant if the right combination of techniques is used to process external information.
Language:
English
Type (Professor's evaluation):
Scientific
Contact:
Eugénio Oliveira