Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

Abstract

Reinforcement Learning (RL) controllers have proved to effectively tackle the dual objectives of path following and collision avoidance. However, finding which RL algorithm setup optimally trades off these two tasks is not necessarily easy. This work proposes a methodology to explore this that leverages analyzing the performance and task-specific behavioral characteristics for a range of RL algorithms applied to path-following and collision-avoidance for underactuated surface vehicles in environments of increasing complexity. Compared to the introduced RL algorithms, the results show that the Proximal Policy Optimization (PPO) algorithm exhibits superior robustness to changes in the environment complexity, the reward function, and when generalized to environments with a considerable domain gap from the training environment. Whereas the proposed reward function significantly improves the competing algorithms’ ability to solve the training environment, an unexpected consequence of the dimensionality reduction in the sensor suite, combined with the domain gap, is identified as the source of their impaired generalization performance.

Read the publication

Language

English

Author(s)

Thomas Nakken Larsen
Halvor Ødegård Teigen
Torkel Laache
Damiano Varagnolo
Adil Rasheed

Affiliation

SINTEF Digital / Mathematics and Cybernetics
Norwegian University of Science and Technology

Year

2021

Published in

Frontiers in Robotics and AI

Volume

DOI

https://doi.org/10.3389/frobt.2021.738113

Read fulltext

https://hdl.handle.net/11250/2977081

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us