Abstract
Due to their utility in replacing workers in tasks unsuitable for humans, unmanned underwater vehicles (UUVs) have become increasingly common tools in the fish farming industry. However, earlier studies and anecdotal evidence from farmers imply that farmed fish tend to move away from and avoid intrusive objects such as vehicles that are deployed and operated inside net pens. Such responses could imply a discomfort associated with the intrusive objects, which, in turn, can lead to stress and impaired welfare in the fish. To prevent this, vehicles and their control systems should be designed to automatically adjust operations when they perceive that they are repelling the fish. A necessary first step in this direction is to develop on-vehicle observation systems for assessing object/vehicle–fish distances in real-time settings that can provide inputs to the control algorithms. Due to their small size and low weight, modern cameras are ideal for this purpose. Moreover, the ongoing rapid developments within deep learning methods are enabling the use of increasingly sophisticated methods for analyzing footage from cameras. To explore this potential, we developed three new pipelines for the automated assessment of fish–camera distances in video and images. These methods were complemented using a recently published method, yielding four pipelines in total, namely, SegmentDepth , BBoxDepth , and SuperGlue that were based on stereo-vision and DepthAnything that was monocular. The overall performance was investigated using field data by comparing the fish–object distances obtained from the methods with those measured using a sonar. The four methods were then benchmarked by comparing the number of objects detected and the quality and overall accuracy of the stereo matches (only stereo-based methods). SegmentDepth , DepthAnything , and SuperGlue performed well in comparison with the sonar data, yielding mean absolute errors (MAE) of 0.205 m (95% CI: 0.050–0.360), 0.412 m (95% CI: 0.148–0.676), and 0.187 m (95% CI: 0.073–0.300), respectively, and were integrated into the Robot Operating System (ROS2) framework to enable real-time application in fish behavior identification and the control of robotic vehicles such as UUVs.