1. Place all the data in the following structure, and there are 130 scenes in total. |-- google1000 |-- scenes/ | |--- scene_0000/ | |--- scene_0001/ | |--- ... ... | |--- scene_1499/ | |-- models | |-- 000/ # Details of model of object 0 | |-- ... ... | `-- 1029/ |-- models_down | |-- 000.ply # Downsampled model point cloudof object 0 | |-- ... ... | `-- 1029.ply | |-- camera.json # Camera intrinsics | `-- graspnet_labels_v3 # correspondence lables between scenes and object models in ./models_down/; initially empty, need further labelling 2. Detail structure of each scene (take scene_0000 as an example) |-- scene_0000 |-- blender_proc | | |-- rgb | | | |-- 0000.jpg to 049.jpg # 50 rgb images | | `-- depth | | | |-- 0000.png to 049.png # 50 depth images | | `-- label | | | |-- 0000.png to 049.png # 50 object mask images, 0 is background, 1-88 denotes each object (1-indexed), same format as YCB-Video dataset | | `-- annotations | | | |-- 0000.xml to 049.xml # 50 object 6d pose annotation. 'pos_in_world' and'ori_in_world' denotes position and orientation w.r.t the camera frame. | | `-- meta | | | |-- 0000.mat to 049.mat # 50 object 6d pose annotation, same format as YCB-Video dataset for easy usage | | `-- camK.npy # camera intrinsic, shape: 3x3, [[f_x,0,c_x], [0,f_y,c_y], [0,0,1]] | | `-- camera_poses.npy # 50 camera poses with respect to the first frame, shape: 256x(4x4) | | `-- cam0_wrt_table.npy # first frame's camera pose with respect to the table, shape: 4x4 | `-- object_id_list.txt # ids of objects appeared in this scene
Copyright © 2021 Machine Vision and Intelligence Group, Shanghai Jiao Tong University.