Research Seminar on AI: Local Policy Search with Bayesian Gradient Estimates
- +49 241 80 92049
- E-Mail schreiben
Dienstag, 29.11.2022, 16.00 Uhr
Reinforcement learning aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of data-efficiency. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. We are developing algorithms to join both worlds, utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method with increased data-efficiency when compared to existing black-box algorithms.
Alexander is currently a PhD student at RWTH Aachen University. His research is focused on data-efficient adaption of controllers (a.k.a. policies) to their environments.
The first part of his PhD took place at the Max Planck Institute for Intelligent Systems. In 2020 he moved together with his research group to the RWTH Aachen University building the new Institute for Data Science in Mechanical Engineering. He is also an associated student of the International Max Planck Research School for Intelligent Systems (IMPRS-IS).
Before joining the Max-Planck Institute he received a M.Sc. in Computer Science from the University of Lübeck and a B.Eng. in Electrical Engineering from the Berlin University of Applied Sciences and Technology.