Special Issue on Adaptive Dynamic Programming and Reinforcement Learning
Special Issue on Adaptive Dynamic Programming and Reinforcement Learning
TLDR
The main focus of ADP/RL research is to solve the Hamilton–Jacobi–Bellman (HJB) equation with manageable amount of computation to avoid the “curse of dimensionality.”
摘要
THE PAST decade has witnessed a surge in research activities related to adaptive dynamic programming (ADP) and reinforcement learning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamic programming problems. In a 1995 article by Barto et al. [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamic programming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray et al. [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix]. It all started more than 40 years’ ago, when Werbos outlined an approach [item 16) in the Appendix] which was later named “adaptive critic designs” (ACD) [item 17)–20) in the Appendix]. By the time when he published the 1992 book chapter [item 18) in the Appendix], Werbos has used the three terms “ACD,” “approximate dynamic programming (ADP),” and “RL” interchangeably, and later, also interchangeably with “adaptive dynamic programming (ADP)” [item 21)–23) in the Appendix]. The main focus of ADP/RL research is to solve the Hamilton–Jacobi–Bellman (HJB) equation with manageable amount of computation to avoid the “curse of dimensionality.” Many methods of direct solution and iterative solution have been proposed to solve the HJB equation, but new ideas are still wanting. A very important research direction would be to find novel and efficient approaches for solving the HJB equation. Dynamic programming is an important subject in optimal control [item 24) in the Appendix]. It is therefore expected that ADP/RL plays an important role in solving optimal control problems of complex nonlinear systems. Furthermore, the most famous application of ADP/RL is the AlphaGo [item 25) in the Appendix] computer game, when combined with deep learning. Applications of ADP/RL to real-world problems which can bring about more significant social and economic impact are expected.
