To main content

Combining Q-learning and Deterministic Policy Gradient for Learning-Based MPC

Abstract

This paper considers adjusting a fully parametrized model predictive control (MPC) scheme to approximate the optimal policy for a system as accurately as possible. By adopting MPC as a function approximator in reinforcement learning (RL), the MPC parameters can be adjusted using Q-learning or policy gradient methods. However, each method has its own specific shortcomings when used alone. Indeed, Q-learning does not exploit information about the policy gradient and therefore may fail to capture the optimal policy, while policy gradient methods miss any cost function corrections not affecting the policy directly. The former is a general problem, whereas the latter is an issue when dealing with economic problems specifically. Moreover, it is notoriously difficult to perform second-order steps in the context of policy gradient methods while it is straightforward in the context of Q-learning. This calls for an organic combination of these learning algorithms, in order to fully exploit the MPC parameterization as well as speed up convergence in learning.
Read the publication

Category

Academic article

Language

English

Author(s)

  • Katrine Seel
  • Sebastien Nicolas Gros
  • Jan Tommy Gravdahl

Affiliation

  • SINTEF Digital / Mathematics and Cybernetics
  • Norwegian University of Science and Technology

Year

2023

Published in

IEEE Conference on Decision and Control. Proceedings

ISSN

0743-1546

Volume

62

View this publication at Norwegian Research Information Repository