We conduct large-scale grasping experiments on three different robotic hands, tested on over 150 unseen objects in cluttered environments. The grasping process for each robotic hand lasts over 3 hours, with the total time across all three hands exceeding 10 hours. We applied a 2x speed-up for the grasp execution phase and a 20x speed-up for the collision detection phase. Presenting the complete process of our robotic grasping experiments can provide valuable insights into potential improvements for the grasping system. Additionally, it is essential to demonstrate the system’s robustness, which requires running it for an extended period.
Toy | Textile | Household |
Hardware | Food | Adversarial |
Toy | Textile | Household |
Hardware | Food | Adversarial |
Toy | Textile | Household |
Hardware | Food | Adversarial |
We introduce an efficient approach for learning dexterous grasping with minimal data, advancing robotic manipulation capabilities across different robotic hands. Unlike traditional methods that require millions of grasp labels for each robotic hand, our method achieves high performance with human-level learning efficiency: only hundreds of grasp attempts on 40 training objects. The approach separates the grasping process into two stages: first, a universal model maps scene geometry to intermediate contact-centric grasp representations, independent of specific robotic hands. Next, a unique grasp decision model is trained for each robotic hand through real-world trial and error, translating these representations into final grasp poses. Our results show a grasp success rate of 75-95% across three different robotic hands in real-world cluttered environments with over 150 novel objects, improving to 80-98% with increased training objects. This adaptable method demonstrates promising applications for humanoid robots, prosthetics, and other domains requiring robust, versatile robotic manipulation.
Based on the contact-centric grasp representations output by a trained representation model, we first employ a training set of 144 objects to train the grasp decision model. Compared to the baseline method, the substantial improvements across this extensive test set demonstrate the effectiveness of our proposed representation and approach.
Our experiments show that proper learning methods can significantly reduce the need for training objects. Instead of thousands, only 40 objects are sufficient. Additionally, our method's high learning efficiency allows it to converge quickly with as few as 100 grasp attempts per grasp type.
Scaling up data along the right dimension is key to improving the model's generalization ability. Although the training object and testing object have very different overall shape, we can find local geometries on them that are pretty similar. The results demonstrate that increasing sample density for each training object is far more impactful than increasing the number of objects.
The authors would like to thank Xiaolin Fang for the helpful revision and Antonia Bronars and Jiang Zou for the helpful discussion.
The website is licensed under GNU Gemeral Public License 3.0, modified upon original DexCap website.