Collaborative navigation is the most promising technique for infrastructure-free indoor navigation for a group of pedestrians, such as rescue personnel. Infrastructure-free navigation means using a system that is able to localize itself independent of any equipment pre-installed to the building via using various sensors monitoring the motion of the user. The most feasible navigation sensors are inertial sensors and a camera providing motion information when a computer vision method called visual odometry is used. Collaborative indoor navigation sets challenges to the use of computer vision; navigation environment is often poor of tracked features, other pedestrians in front of the camera interfere with motion detection, and the size and cost constraints prevent the use of best quality cameras resulting in measurement errors. We have developed an improved computer vision based collaborative navigation method addressing these challenges via using a depth (RGB-D) camera, a deep learning based detector to avoid using features found from other pedestrians and for controlling the inconsistency of object depth detection, which would degrade the accuracy of the visual odometry solution if not controlled. We have compared our visual odometry solution to a one obtained using the same low-cost RGB-D camera but no corrections, and find the solution much improved. Finally, we show the result for computing the solution using visual odometry and inertial sensor fusion for the individual and UWB ranging for collaborative navigation.