Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Seite - 127 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 127 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 127 -

Bild der Seite - 127 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text der Seite - 127 -

Segmentation Mask Refinement a) b) c) Figure 3. Our automatic annotation pipeline. a) Two consecutive depth images with one object removed (marked in red). Calculating the difference of the depth images gives a rough segmentation mask of the removed object. b) Refinement of the mask using morphological operations and Gaussian filtering. c) Geometric features (object edges, skeleton, center of mass) are calculated using the refined segmentation mask and are used afterwards to calculate the final position of the graspingpointproposals. The last step transfers theproposed boundingboxes to thecorrespondingRGBimage. the scene, without any additional costs. The only time humans are involved is, when checking all the predicted labelsviamanual inspection tofindimages which contain erroneous labels. In this process we roughly drop10% of the images to avoid inaccurate labeled training data. Figure 4 shows results of our automatically labeleddataset. 3.3.Human-basedDataAnnotation Additionally to our automatic labeling approach, wealso labeledthewholedatasetmanually. Theidea is to train a grasp prediction network on both types of labels independently, and then compare the per- formance of both approaches. All hand labeled data werecheckedbyhumanexpertswithdomainknowl- edge to verify thecorrectnessof theannotations. 4. Grasping Point Prediction in a Cluttered Environment Chu et al. [4] proposed a deep neural network to predict multiple grasping points for multiple objects inthescene. Weadaptedtheirapproachandretrained the network withour specificdataset. 4.1.NetworkArchitectureandLossFunction The network architecture is based on the Faster R-CNN object detection framework [14] using a ResNet-50 [6] as backbone. It takes a three chan- nel RGB image as input and predicts a number of grasping point candidates, whereas one candidateg is defined as described in Equation 1. Note that the rotation angle θ is quantized intoR= 19 intervals, which makes the prediction of this parameter a clas- sification problem. All other parameters (see Equa- tion 1) are predicted using regression. During train- ing, thecomposite loss functionLtotal isdefinedas Ltotal=Lgpn+Lgcr, (4) whereLgpndescribes the lossaccording to the grasp proposal net andLgcr is the grasp configuration pre- diction loss. The loss termLgpn is used to define initial rectangular bounding box proposals without orientation ({x,y,w,h}), whereas Lgcr is used to define the orientation and the refined bounding box prediction {x,y,θ,w,h}. Figure 5 shows the struc- ture of the prediction network and indicates how the loss partsLgpn andLgcr are calculated. Further in- formationabout thenetworkarchitectureandtheloss functioncanbe found in [4]. 4.2.DataPreprocessingandAugmentation Our dataset for training the prediction network consists of only 52 images. Therefore, data augmen- tation is used to increase the size of the training data by the factorof100. Figure6showsexamplesof the augmented data. This increases the variation in the training data and decreases the possibility of overfit- tingduring training. Afteraugmentation,each image wasresizedto227×227px tofit the inputdimension of thenetwork. 4.3.TrainingSchedule Pre-trained ImageNet [5] weights are used as ini- tializationfor theResNet-50backbonetoavoidover- fitting and ease the training process. All other lay- ers beyond ResNet-50 are trained from scratch. The whole structure of the network can be seen in Fig- ure 5. We used the Adam Optimizer and trained our 127
zurück zum  Buch Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Titel
Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber
Graz University of Technology
Ort
Graz
Datum
2020
Sprache
englisch
Lizenz
CC BY 4.0
ISBN
978-3-85125-752-6
Abmessungen
21.0 x 29.7 cm
Seiten
188
Kategorien
Informatik
Technik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020