Seite - 128 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 128 -

Text der Seite - 128 -

Figure 4. Visualization of automatically generated labels. Each edge of one grasping point proposal is visualized with a different color to show the orientation of the box. Our method allows dense labeling of the object but only four grasping point proposals are visualized in each image to guarantee the clarity of the visualization. Note that only one object per image is labeledwhich implicitlyadds expertknowledgeabout theoptimalorderofobject removal. Figure 5. Architecture of the grasping point prediction network. The network takes RGB images as input, and predicts multiple grasping candidates. The grasping candidates are defined as an oriented rectangular bounding box. The output bounding boxes are drawn with different colors, whereas the red edges denote the parallel plates of the gripper and the black lines indicate theopeningwidthof thegripper. Figurewas taken from[4]. Figure 6. Data Augmentation. (Left) RGB input image, (others) randomlyshiftedand rotated input image. network for 50000 iterations with a initial learning rateα = 0.0001. The anchor sizes for the bound- ingboxproposalsarechosenaccording to thesizeof theobjects inourdatasetusing [8,16,24,28]px,with anchor ratios of [0.5,1,2]. All other hyperparame- ters were taken from [4]. Note that the goal of these experiments was to show the practical benefit of our methodforautomatic labelgeneration, rather than to compete for thebestpossibleperformance forgrasp- ing point prediction. We believe that a more care- ful selection of hyperparameters, combined with an optimized training schedule could further boost the results. 5.ExperimentsandEvaluation We trained the previously described prediction network two times separately, once with automati- cally annotated data and once with the same data la- beledbyhand. Bothnetworkswereevaluatedusinga test set containing 22 images which are independent fromthetrainingdata(differentcameraposition, ran- dom placement of objects) to verify the generaliza- tion capabilities of our network. We used the same training schedule for both methods, as well as the same parameters for non-maximum suppression for both experiments to ensure a fair comparison. The evaluationofourpredictedgraspingcandidates isdi- vided into twoparts: 1. Quantitative evaluation of the predicted grasp- ingpointsbycalculating the ratioofgraspable / non-graspablecandidates. 2. Qualitative evaluation by visualizing the pre- dictedgraspingcandidates. 5.1.QuantitativeEvaluation For quantitative evaluation we decided to calcu- late therelativenumberofpredictedgraspcandidates thatarenon-graspableforbothnetworks trainedwith manually/automatically labeled data. We define a non-graspable prediction as 1) the size of the pre- dicted bounding box is unsuitable ( either too big or toosmall)or2)grasping isnot feasibledue topartial occlusion of the object. Figure 8 shows examples of non-graspable candidates. Table 1 shows the quan- titative results indicating that a deep network trained with automatically labeled data can achieve similar performance compared to the same network trained withmanually labeleddata. 128

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik