To main content

Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles

Abstract

This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV’s global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV’s net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.
Read the publication

Category

Academic article

Language

English

Author(s)

  • Marco Job
  • David Botta
  • Victor Reijgwart
  • Luca Ebner
  • Andrej Studer
  • Roland Siegwart
  • Eleni Kelasidi

Affiliation

  • SINTEF Ocean / Aquaculture
  • Switzerland
  • ETH Zurich
  • Norwegian University of Science and Technology

Year

2025

Published in

Frontiers in Robotics and AI

Volume

12

Page(s)

1 - 22

View this publication at Norwegian Research Information Repository