To main content

Learning incipient slip with GelSight sensors: Attention Classification with Video Vision Transformers

Abstract

An important aspect of robotic grasping is the ability to detect incipient slip based on real-time information through tactile sensors. In this paper, we propose to use Video Vision Transformers to detect the onset of slip in grasping scenarios. The dynamic nature of slip makes Video Vision Transformers well-suited for capturing temporal correlations with relatively small datasets. The training data is acquired through two GelSight tactile sensors attached to the generic finger grippers of a Panda Franka Emika robot arm that grasps, lifts and shakes 30 everyday objects in order to induce slip. We further conducted an ablation study by considering 5, 4, 3, and 2 frames prior to slip onset, revealing consistent prediction accuracy. Our approach demonstrates the capability to predict slips well in advance, even up to the 5th frame before the onset. This underscores the predictive capability of our approach, indicating its effectiveness in slip detection well before of its occurrence. This advance prediction capability may be a valuable tool for undertaking preemptive corrective actions, such as implementing a more secure gripper closure. We evaluate the efficiency of our approach to predict onset of slip on 10 previously-unseen objects and achieve a zero-shot mean prediction accuracy of 99%.
Read the publication

Category

Academic article

Language

English

Author(s)

Affiliation

  • SINTEF Ocean / Fisheries and New Biomarine Industry

Year

2024

Published in

Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

ISSN

2153-0858

View this publication at Norwegian Research Information Repository