Abstract
We investigate Actor-Critic algorithms from the non-convex optimisation perspective. For the past years, powerful Deep Reinforcement Learning algorithms, such as Deep Deterministic Policy Gradients, have been observed to struggle even in tiny toy problems. Yet, only the critic training has been subject to intensive research. To close this gap, we conduct a critical point analysis for the actor training. First, we find that the reward function must satisfy additional conditions next to those for Deterministic Policy Gradients, such that the critic is a proper loss for the actor. Second, we address the impact of using over-parametrised Neural Networks in the actor part. If there are more parameters than samples, a Q-function has less sources of critical points with respect to its action input leading to better actor training. Additionally, critical points of the actor loss are only those, where the Q-function is extremal. Third, we outline challenges in the formulation of a sound optimisation task. They arise due to conflicting requirements between Reinforcement Learning and Neural Network architectures.
Original language | English |
---|---|
Pages (from-to) | 27-32 |
Number of pages | 6 |
Journal | IFAC Proceedings Volumes (IFAC-PapersOnline) |
Volume | 55 |
Issue number | 15 |
DOIs | |
State | Published - 1 Jul 2022 |
Event | 6th IFAC Conference on Intelligent Control and Automation Sciences, ICONS 2022 - Cluj-Napoca, Romania Duration: 13 Jul 2022 → 15 Jul 2022 |
Keywords
- Critical Points
- Dynamic Programming
- Function Approximation
- Markov Decision Process
- Optimisation