Format

1. Place all the data in the following structure, 
the train images and test images contain the 190 scenes in total.
Folder 'scenes' and 'models' are the same with GraspNet dataset. 
It's okay to simply extract labels and dense_point_clouds under your pre-downloaded GraspNet dataset.

|-- suctionnet
|-- scenes
|   |-- scene_0000/
|   |-- scene_0001/
|   |-- ... ...
|   `-- scene_0189/
|
|
|-- models
|   |-- 000/
|   |-- 001/
|   |-- ...
|   `-- 087/
|
|
|-- dense_point_clouds
|   |-- 000.npz
|   |-- 001.npz
|   |-- ...
|   `-- 087.npz   
|
|
|-- seal_label
|   |-- 000_seal.npz
|   |-- 001_seal.npz
|   |-- ...
|   `-- 087_seal.npz
|
|
|-- wrench_label
|   |-- 0000_wrench.npz
|   |-- 0001_wrench.npz
|   |-- ...
|   `-- 0189_wrench.npz
|
`-- suction_collision_label
|-- 0000_collision.npz/
|-- 0001_collision.npz/
|-- ... ...
`-- 0189_collision.npz/


2. Detail structure of each scene
|-- scenes
|-- scene_0000
|   |-- object_id_list.txt              # objects' id that appear in this scene, 0-indexed
|   |-- rs_wrt_kn.npy                   # realsense camera pose with respect to kinect, shape: 256x(4x4)
|   |-- kinect                          # data of kinect camera
|   |   |-- rgb                         
|   |   |   |-- 0000.png to 0255.png    # 256 rgb images
|   |   `-- depth
|   |   |   |-- 0000.png to 0255.png    # 256 depth images
|   |   `-- label
|   |   |   |-- 0000.png to 0255.png    # 256 object mask images, 0 is background, 1-88 denotes each object (1-indexed), same format as YCB-Video dataset
|   |   `-- annotations
|   |   |   |-- 0000.xml to 0255.xml    # 256 object 6d pose annotation. ‘pos_in_world' and'ori_in_world' denotes position and orientation w.r.t the camera frame. 
|   |   `-- meta
|   |   |   |-- 0000.mat to 0255.mat    # 256 object 6d pose annotation, same format as YCB-Video dataset for easy usage
|   |   `-- rect
|   |   |   |-- 0000.npy to 0255.npy    # 256 2D planar grasp labels
|   |   |   
|   |   `-- camK.npy                    # camera intrinsic, shape: 3x3, [[f_x,0,c_x], [0,f_y,c_y], [0,0,1]]
|   |   `-- camera_poses.npy            # 256 camera poses with respect to the first frame, shape: 256x(4x4)
|   |   `-- cam0_wrt_table.npy          # first frame's camera pose with respect to the table, shape: 4x4
|   |
|   `-- realsense
|       |-- same structure as kinect
|
|
`-- scene_0001
|
`-- ... ...
|
`-- scene_0189
	


License

Copyright © 2021 Machine Vision and Intelligence Group, Shanghai Jiao Tong University.