Seite - 128 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 128 -
Text der Seite - 128 -
Figure 4. Visualization of automatically generated labels. Each edge of one grasping point proposal is visualized with a
different color to show the orientation of the box. Our method allows dense labeling of the object but only four grasping
point proposals are visualized in each image to guarantee the clarity of the visualization. Note that only one object per
image is labeledwhich implicitlyadds expertknowledgeabout theoptimalorderofobject removal.
Figure 5. Architecture of the grasping point prediction network. The network takes RGB images as input, and predicts
multiple grasping candidates. The grasping candidates are defined as an oriented rectangular bounding box. The output
bounding boxes are drawn with different colors, whereas the red edges denote the parallel plates of the gripper and the
black lines indicate theopeningwidthof thegripper. Figurewas taken from[4].
Figure 6. Data Augmentation. (Left) RGB input image,
(others) randomlyshiftedand rotated input image.
network for 50000 iterations with a initial learning
rateα = 0.0001. The anchor sizes for the bound-
ingboxproposalsarechosenaccording to thesizeof
theobjects inourdatasetusing [8,16,24,28]px,with
anchor ratios of [0.5,1,2]. All other hyperparame-
ters were taken from [4]. Note that the goal of these
experiments was to show the practical benefit of our
methodforautomatic labelgeneration, rather than to
compete for thebestpossibleperformance forgrasp-
ing point prediction. We believe that a more care-
ful selection of hyperparameters, combined with an
optimized training schedule could further boost the
results.
5.ExperimentsandEvaluation
We trained the previously described prediction
network two times separately, once with automati-
cally annotated data and once with the same data la-
beledbyhand. Bothnetworkswereevaluatedusinga
test set containing 22 images which are independent
fromthetrainingdata(differentcameraposition, ran- dom placement of objects) to verify the generaliza-
tion capabilities of our network. We used the same
training schedule for both methods, as well as the
same parameters for non-maximum suppression for
both experiments to ensure a fair comparison. The
evaluationofourpredictedgraspingcandidates isdi-
vided into twoparts:
1. Quantitative evaluation of the predicted grasp-
ingpointsbycalculating the ratioofgraspable /
non-graspablecandidates.
2. Qualitative evaluation by visualizing the pre-
dictedgraspingcandidates.
5.1.QuantitativeEvaluation
For quantitative evaluation we decided to calcu-
late therelativenumberofpredictedgraspcandidates
thatarenon-graspableforbothnetworks trainedwith
manually/automatically labeled data. We define a
non-graspable prediction as 1) the size of the pre-
dicted bounding box is unsuitable ( either too big or
toosmall)or2)grasping isnot feasibledue topartial
occlusion of the object. Figure 8 shows examples of
non-graspable candidates. Table 1 shows the quan-
titative results indicating that a deep network trained
with automatically labeled data can achieve similar
performance compared to the same network trained
withmanually labeleddata.
128
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik