GraspNet

Unseen Object 6D Pose Benchmark: In the current task setting of 6D pose estimation, the same object set is shared in both training and testing phase. Taking such assumption that the testing object is always available during the training period, current state-of-the-art 6D pose estimation algorithms follow the schema that directly models the object's texture and geometry features within the neural networks. Prior knowledge of object models such as keypoint location or voting offsets is also encoded by the networks. It turns out that these methods can only estimate the pose of known objects during training. In real-world applications such as the flexible robotic assembly, novel objects appear frequently. To detect their poses, new data collection process including keypoints allocation and synthetic image generation needs to be repeated, and the network needs to be retrained. This is labor intensive and prevents the 6D pose estimation algorithms from rapid deployment.
In this project, we reconsider this problem and propose to explore a new direction. We proposed a new task, which is similar to the original pose estimation problem except that the mesh models of objects in the test set will not be available during training.
To fulfill the task, we propose a new benchmark that contains a training set with over 1000 objects and 1500 scenes and a test set with 48 novel objects and 90 real-world captured scenes, built on top of graspNet, BlenderProc.

Download

unseenp1.zip [14 GB] [Baidu (code: 6qse)]
unseenp2.zip [14.0 GB] [Baidu (code: gfej)]
unseenp3.zip [14.0 GB] [Baidu (code: lior)]
unseenp4.zip [14.0 GB] [Baidu (code: hdny)]
unseenp5.zip [14.0 GB] [Baidu (code: qyky)]
unseenp6.zip [14.0 GB] [Baidu (code: lvkv)]
code for label generation [GitHub]
network training code coming soon

1. Place all the data in the following structure, and there are 130 scenes in total. |-- google1000 |-- scenes/ | |--- scene_0000/ | |--- scene_0001/ | |--- ... ... | |--- scene_1499/ | |-- models | |-- 000/ # Details of model of object 0 | |-- ... ... | `-- 1029/ |-- models_down | |-- 000.ply # Downsampled model point cloudof object 0 | |-- ... ... | `-- 1029.ply | |-- camera.json # Camera intrinsics | `-- graspnet_labels_v3 # correspondence lables between scenes and object models in ./models_down/; initially empty, need further labelling 2. Detail structure of each scene (take scene_0000 as an example) |-- scene_0000 |-- blender_proc | | |-- rgb | | | |-- 0000.jpg to 049.jpg # 50 rgb images | | `-- depth | | | |-- 0000.png to 049.png # 50 depth images | | `-- label | | | |-- 0000.png to 049.png # 50 object mask images, 0 is background, 1-88 denotes each object (1-indexed), same format as YCB-Video dataset | | `-- annotations | | | |-- 0000.xml to 049.xml # 50 object 6d pose annotation. 'pos_in_world' and'ori_in_world' denotes position and orientation w.r.t the camera frame. | | `-- meta | | | |-- 0000.mat to 049.mat # 50 object 6d pose annotation, same format as YCB-Video dataset for easy usage | | `-- camK.npy # camera intrinsic, shape: 3x3, [[f_x,0,c_x], [0,f_y,c_y], [0,0,1]] | | `-- camera_poses.npy # 50 camera poses with respect to the first frame, shape: 256x(4x4) | | `-- cam0_wrt_table.npy # first frame's camera pose with respect to the table, shape: 4x4 | `-- object_id_list.txt # ids of objects appeared in this scene

Download

RGB Images

Depth Images

Semantic Labels

Format

License

GraspNet