In this paper, we present a simulation-based dynamic programming method that learns the ‘cost-to-go’ function in an iterative manner. The method is intended to combat two important drawbacks of the conventional Model Predictive Control (MPC) formulation, which are the potentially exorbitant online computational requirement and the inability to consider the future interplay between uncertainty and estimation in the optimal control calculation. We use a nonlinear Van de Vusse reactor to investigate the efficacy of the proposed approach and identify further research issues.
Crites RH, Barto AG, "Improving Elevator Performance Using Reinforcement Learning," Advances in Neural Information Processing Systems 8, Touretzky, D.S., Mozer, M.C. and Haselmo, M.E., ed., MIT Press, Cambridge, MA, 1017, 1996
Sutton RS, Barto AG, "Reinforcement Learning: An Introduction," MIT Press, Cambridge, MA, 1998
Tesauro GJ, Machine Learning, 8, 257, 1992
VandeVusse JG, Chem. Eng. Sci., 19, 964, 1964
Zhang W, Dietterich TG, "A Reinforcement Learning Approach to Job Shop Scheduling," Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1114, 1995