Interpolation assisted deep reinforcement learning

Baron Pilar von Pilchau, Wenzel

Reinforcement Learning is a field of Machine Learning that, in contrast to the other prominent representatives, generates its training data during runtime in direct interaction with an environment. A sample of this training data is called experience and represents a state-transition that is caused by an action together with the corresponding reward signal. Experiences can be seen as a form of knowledge about the underlying dynamics of the environment and a common technique in the field of Reinforcement Learning is the so-called Experience Replay which stores and replays experiences that have been observed at some point in training. By doing so, sample efficiency can be increased as experiences are used many times for training instead of throwing them away after one update. As experiences are generated during runtime, the learner has to explore the state-space and it is only able to learn the dynamics of specific areas when it has been there sometime in the past. As mentioned earlier,Reinforcement Learning is a field of Machine Learning that, in contrast to the other prominent representatives, generates its training data during runtime in direct interaction with an environment. A sample of this training data is called experience and represents a state-transition that is caused by an action together with the corresponding reward signal. Experiences can be seen as a form of knowledge about the underlying dynamics of the environment and a common technique in the field of Reinforcement Learning is the so-called Experience Replay which stores and replays experiences that have been observed at some point in training. By doing so, sample efficiency can be increased as experiences are used many times for training instead of throwing them away after one update. As experiences are generated during runtime, the learner has to explore the state-space and it is only able to learn the dynamics of specific areas when it has been there sometime in the past. As mentioned earlier, experiences can be seen as knowledge about the underlying problem and it is possible to generate synthetic experiences of states that have not been visited yet based on stored real experiences of neighbouring states. Such synthetic experiences can be generated by means of interpolation and can further on be used to assist the learner with exploration. Also, sample efficiency can be increased even further as real experiences are used to generate synthetic ones. In this work, two different techniques are presented that make use of synthetic experiences to assist the learner. The first approach stores generated synthetic experiences in the buffer alongside real experiences and during training real, as well as synthetic experiences are drawn at random from the buffer. This mechanism is called Interpolated Experience Replay. The second approach leverages on the architectural design of the Deep Q-Network and uses synthetic experiences to enable training updates that take the full action-space into account. This second algorithm is called Full-Update DQN. As methods that combine interpolation with a replay buffer and model-free learning algorithms fit neither the definition of model-free, nor model-based, the new class Semi-Model-Based is introduced to cover them.… show more

Author:	Wenzel Baron Pilar von Pilchau ORCiD GND
URN:	urn:nbn:de:bvb:384-opus4-1118145
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/111814
Advisor:	Jörg Hähner
Type:	Doctoral Thesis
Language:	English
Year of first Publication:	2024
Publishing Institution:	Universität Augsburg
Granting Institution:	Universität Augsburg, Fakultät für Angewandte Informatik
Date of final exam:	2024/02/15
Release Date:	2024/03/28
Tag:	Reinforcement Learning; Experience Replay; Machine Learning
GND-Keyword:	Maschinelles Lernen; Bestärkendes Lernen <Künstliche Intelligenz>; Interpolation; Organic Computing
Pagenumber:	xxix, 185
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Organic Computing
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):	CC-BY-NC-SA 4.0: Creative Commons: Namensnennung - Nicht kommerziell - Weitergabe unter gleichen Bedingungen (mit Print on Demand)

Open Access

Interpolation assisted deep reinforcement learning

Download full text files

Export metadata

Statistics

Print On Demand

Additional Services