Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles

Abstract

This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV’s global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV’s net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.

Read the publication

Language

English

Author(s)

Marco Job
David Botta
Victor Reijgwart
Luca Ebner
Andrej Studer
Roland Siegwart
Eleni Kelasidi

Affiliation

SINTEF Ocean / Aquaculture
Switzerland
ETH Zurich
Norwegian University of Science and Technology

Year

2025

Published in

Frontiers in Robotics and AI

Volume

Page(s)

1 - 22

DOI

https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2025.1609765/full

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us