To main content

Variance-Based Exploration for Learning Model Predictive Control

Abstract

The combination of model predictive control (MPC) and learning methods has been gaining increasing attention as a tool to control systems that may be difficult to model. Using MPC as a function approximator in reinforcement learning (RL) is one approach to reduce the reliance on accurate models. RL is dependent on exploration to learn, and currently, simple heuristics based on random perturbations are most common. This paper considers variance-based exploration in RL geared towards using MPC as function approximator. We propose to use a non-probabilistic measure of uncertainty of the value function approximator in value-based RL methods. Uncertainty is measured by a variance estimate based on inverse distance weighting (IDW). The IDW framework is computationally cheap to evaluate and therefore well-suited in an online setting, using already sampled state transitions and rewards. The gradient of the variance estimate is then used to perturb the policy parameters in a direction where the variance of the value function estimate is increasing. The proposed method is verified on two simulation examples, considering both linear and nonlinear system dynamics, and compared to standard exploration methods using random perturbations.
Read the publication

Category

Academic article

Language

English

Author(s)

  • Katrine Seel
  • Alberto Bemporad
  • Sebastien Nicolas Gros
  • Jan Tommy Gravdahl

Affiliation

  • SINTEF Digital / Mathematics and Cybernetics
  • Italy
  • Norwegian University of Science and Technology

Year

2023

Published in

IEEE Access

Volume

11

Page(s)

60724 - 60736

View this publication at Norwegian Research Information Repository