1. Download, unzip all the files and place them in the following structure, the train images and test images contain the 190 scenes in total. |-- graspnet |-- scenes | |-- scene_0000/ | |-- scene_0001/ | |-- ... ... | `-- scene_0189/ | | |-- models | |-- 000/ | |-- 001/ | |-- ... | `-- 087/ | | |-- dex_models(optional but strongly recommended for accelerating evaluation) | |-- 000.pkl | |-- 001.pkl | |-- ... | `-- 087.pkl | | |-- grasp_label | |-- 000_labels.npz | |-- 001_labels.npz | |-- ... | `-- 087_labels.npz | | `-- collision_label |-- scene_0000/ |-- scene_0001/ |-- ... ... `-- scene_0189/ 2. Detail structure of each scene |-- scenes |-- scene_0000 | |-- object_id_list.txt # objects' id that appear in this scene, 0-indexed | |-- rs_wrt_kn.npy # realsense camera pose with respect to kinect, shape: 256x(4x4) | |-- kinect # data of kinect camera | | |-- rgb | | | |-- 0000.png to 0255.png # 256 rgb images | | `-- depth | | | |-- 0000.png to 0255.png # 256 depth images | | `-- label | | | |-- 0000.png to 0255.png # 256 object mask images, 0 is background, 1-88 denotes each object (1-indexed), same format as YCB-Video dataset | | `-- annotations | | | |-- 0000.xml to 0255.xml # 256 object 6d pose annotation. ‘pos_in_world' and'ori_in_world' denotes position and orientation w.r.t the camera frame. | | `-- meta | | | |-- 0000.mat to 0255.mat # 256 object 6d pose annotation, same format as YCB-Video dataset for easy usage | | `-- rect | | | |-- 0000.npy to 0255.npy # 256 2D planar grasp labels | | | | | `-- camK.npy # camera intrinsic, shape: 3x3, [[f_x,0,c_x], [0,f_y,c_y], [0,0,1]] | | `-- camera_poses.npy # 256 camera poses with respect to the first frame, shape: 256x(4x4) | | `-- cam0_wrt_table.npy # first frame's camera pose with respect to the table, shape: 4x4 | | | `-- realsense | |-- same structure as kinect | | `-- scene_0001 | `-- ... ... | `-- scene_0189
Copyright © 2021 Machine Vision and Intelligence Group, Shanghai Jiao Tong University.