Seite - 125 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 125 -
Text der Seite - 125 -
Grasp
Prediction
Network
Figure 1. Overall workflow of our method containing data acquisition, automatic grasping point annotation using depth
imagesandtrainingadeepnetworkforgraspingpointprediction. (Left)Ourdataset isconstructedbyrecordingsequences
of RGBD images while a human expert removes wooden logs from the scene. (Middle) The sequence of captured depth
images is used to automatically annotate grasping points in every corresponding RGB image. (Right) This automatically
annotateddataare thenused to traina deepneuralnetwork topredictgraspingpoints.
to predict multiple grasping points for multiple
objects in an image. Zeng et al. [18] showed that
they are able to grasp unseen objects with their
winning contribution for the Amazon Robotics
Challenge in 2017. Other approaches [12, 10] use
Reinforcement Learning (RL) on a real or simulated
robot to perform thousands of grasp attempts and
use the feedback to improve the grasping point
predictions. RL has the advantage that no labeled
data are necessary for training, but it is on the other
handvery time andhardwareconsuming.
Representations of grasping points in 2D. Sax-
ena et al. [16] described a grasping point as g =
{x,y}, wherexandydefine the center of the grasp-
ing point proposal. This representation lacks infor-
mation about the opening width of the gripper. Red-
mon and Angelova [13] overcame this limitation by
using a rectangular representation for the grasping
point. This is very similar to the bounding box rep-
resentation of objects in the field of object detec-
tion, with the addition of a rotation angle θwhich
describes the orientation of the bounding box. An
overviewaboutothercommonrepresentationscanbe
found in [3].
Automatic label generation. Datasets used for
deeplearningareoftenhandannotated,whichis time
consumingandcanbeerrorpronedueto the involve-
ment of human annotators. In the domain of ob-
ject segmentation, modern tools like DeepExtreme-
Cut [11] or GrapCut [15] significantly reduce the
amount of work for labeling RGB data to a small
number of clicks. However, they are not fully auto-
maticandarenotable toworkwithdepthdata. Zeng
et al. [19] showed that they are able to use back-
ground subtraction to generate segmentation masks
of new objects in the scene. Suchi et al. [17], most
similar to our approach, use sequences of depth im- ages to predict segmentation masks of the objects in
the scene. However, the difference of our method
compared to all previously mentioned approaches is
thatwedonotonlycalculate thesegmentationmask,
but directly infer grasping proposals. Furthermore,
segmentation masks do not give any information in
which order the objects should be removed, which
can be crucial for grasp success in cluttered environ-
ment.
3. Data Acquisition and Automatic Annota-
tion
This section describes our simple strategy to au-
tomatically label grasping points for scenes with ob-
jects inaclutteredenvironment.
3.1.DataAcquisitionProtocol
The process requires a statically mounted RGBD
camera which records color and depth information
from the scene. We then ask human experts to re-
move one object after the other from the scene. Af-
tereachsuccessfulgrasp,wecapturedepthandcolor
images. Figure2showsasequenceof recordedRGB
images. This method provides us not only with con-
secutiveRGBDimagesof thepickingprocedure,but
also gives implicit information about the optimal or-
der of object removal according to a human expert.
This information is highly important because not all
objects areequallyeasy tograspdue to their random
placement (e.g. objectson topofoneanother).
3.2.AutomaticLabelGeneration
As illustrated in Figure 3, we perform auto-
matic grasping point annotation through an 3-stage
pipeline. Our algorithm takes two consecutive depth
images from the scene as input and calculates grasp
proposals for theobjectwhichwasremoved. Agrasp
125
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik