Combining Q-learning and Deterministic Policy Gradient for Learning-Based MPC

Abstract

This paper considers adjusting a fully parametrized model predictive control (MPC) scheme to approximate the optimal policy for a system as accurately as possible. By adopting MPC as a function approximator in reinforcement learning (RL), the MPC parameters can be adjusted using Q-learning or policy gradient methods. However, each method has its own specific shortcomings when used alone. Indeed, Q-learning does not exploit information about the policy gradient and therefore may fail to capture the optimal policy, while policy gradient methods miss any cost function corrections not affecting the policy directly. The former is a general problem, whereas the latter is an issue when dealing with economic problems specifically. Moreover, it is notoriously difficult to perform second-order steps in the context of policy gradient methods while it is straightforward in the context of Q-learning. This calls for an organic combination of these learning algorithms, in order to fully exploit the MPC parameterization as well as speed up convergence in learning.

Read the publication

Language

English

Author(s)

Katrine Seel
Sebastien Nicolas Gros
Jan Tommy Gravdahl

Affiliation

SINTEF Digital / Mathematics and Cybernetics
Norwegian University of Science and Technology

Year

2023

Published in

IEEE Conference on Decision and Control. Proceedings

ISSN

0743-1546

Volume

DOI

https://doi.org/10.1109/cdc49753.2023.10383562

Read fulltext

https://hdl.handle.net/11250/3136189

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us