Variance-Based Exploration for Learning Model Predictive Control

Abstract

The combination of model predictive control (MPC) and learning methods has been gaining increasing attention as a tool to control systems that may be difficult to model. Using MPC as a function approximator in reinforcement learning (RL) is one approach to reduce the reliance on accurate models. RL is dependent on exploration to learn, and currently, simple heuristics based on random perturbations are most common. This paper considers variance-based exploration in RL geared towards using MPC as function approximator. We propose to use a non-probabilistic measure of uncertainty of the value function approximator in value-based RL methods. Uncertainty is measured by a variance estimate based on inverse distance weighting (IDW). The IDW framework is computationally cheap to evaluate and therefore well-suited in an online setting, using already sampled state transitions and rewards. The gradient of the variance estimate is then used to perturb the policy parameters in a direction where the variance of the value function estimate is increasing. The proposed method is verified on two simulation examples, considering both linear and nonlinear system dynamics, and compared to standard exploration methods using random perturbations.

Read the publication

Language

English

Author(s)

Katrine Seel
Alberto Bemporad
Sebastien Nicolas Gros
Jan Tommy Gravdahl

Affiliation

SINTEF Digital / Mathematics and Cybernetics
Italy
Norwegian University of Science and Technology

Year

2023

Published in

IEEE Access

Volume

Page(s)

60724 - 60736

DOI

https://doi.org/10.1109/access.2023.3282842

Read fulltext

https://hdl.handle.net/11250/3105167

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us