Seite - 125 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 125 -

Text der Seite - 125 -

Grasp Prediction Network Figure 1. Overall workflow of our method containing data acquisition, automatic grasping point annotation using depth imagesandtrainingadeepnetworkforgraspingpointprediction. (Left)Ourdataset isconstructedbyrecordingsequences of RGBD images while a human expert removes wooden logs from the scene. (Middle) The sequence of captured depth images is used to automatically annotate grasping points in every corresponding RGB image. (Right) This automatically annotateddataare thenused to traina deepneuralnetwork topredictgraspingpoints. to predict multiple grasping points for multiple objects in an image. Zeng et al. [18] showed that they are able to grasp unseen objects with their winning contribution for the Amazon Robotics Challenge in 2017. Other approaches [12, 10] use Reinforcement Learning (RL) on a real or simulated robot to perform thousands of grasp attempts and use the feedback to improve the grasping point predictions. RL has the advantage that no labeled data are necessary for training, but it is on the other handvery time andhardwareconsuming. Representations of grasping points in 2D. Sax- ena et al. [16] described a grasping point as g = {x,y}, wherexandydefine the center of the grasp- ing point proposal. This representation lacks infor- mation about the opening width of the gripper. Red- mon and Angelova [13] overcame this limitation by using a rectangular representation for the grasping point. This is very similar to the bounding box rep- resentation of objects in the field of object detec- tion, with the addition of a rotation angle θwhich describes the orientation of the bounding box. An overviewaboutothercommonrepresentationscanbe found in [3]. Automatic label generation. Datasets used for deeplearningareoftenhandannotated,whichis time consumingandcanbeerrorpronedueto the involve- ment of human annotators. In the domain of ob- ject segmentation, modern tools like DeepExtreme- Cut [11] or GrapCut [15] significantly reduce the amount of work for labeling RGB data to a small number of clicks. However, they are not fully auto- maticandarenotable toworkwithdepthdata. Zeng et al. [19] showed that they are able to use back- ground subtraction to generate segmentation masks of new objects in the scene. Suchi et al. [17], most similar to our approach, use sequences of depth im- ages to predict segmentation masks of the objects in the scene. However, the difference of our method compared to all previously mentioned approaches is thatwedonotonlycalculate thesegmentationmask, but directly infer grasping proposals. Furthermore, segmentation masks do not give any information in which order the objects should be removed, which can be crucial for grasp success in cluttered environ- ment. 3. Data Acquisition and Automatic Annota- tion This section describes our simple strategy to au- tomatically label grasping points for scenes with ob- jects inaclutteredenvironment. 3.1.DataAcquisitionProtocol The process requires a statically mounted RGBD camera which records color and depth information from the scene. We then ask human experts to re- move one object after the other from the scene. Af- tereachsuccessfulgrasp,wecapturedepthandcolor images. Figure2showsasequenceof recordedRGB images. This method provides us not only with con- secutiveRGBDimagesof thepickingprocedure,but also gives implicit information about the optimal or- der of object removal according to a human expert. This information is highly important because not all objects areequallyeasy tograspdue to their random placement (e.g. objectson topofoneanother). 3.2.AutomaticLabelGeneration As illustrated in Figure 3, we perform auto- matic grasping point annotation through an 3-stage pipeline. Our algorithm takes two consecutive depth images from the scene as input and calculates grasp proposals for theobjectwhichwasremoved. Agrasp 125

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik