Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Seite - 132 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 132 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 132 -

Bild der Seite - 132 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text der Seite - 132 -

in the case of deformable objects. Objects like fold- ing headphones, scissors, chains, cables can vary in appearance depending on their current usage. This posesaproblemforCNNbasedobjectdetectors. We proposeasimpleRGBbasedmethodfor recognition of rigid but also deformable objects and synthesize images for training a neural network. We then test this method by training the YOLOv3 [13] network with thefullysyntheticdatasetandexplorehowdoes the shape of an object, ie. it’s symmetry and de- formability affect thedetectionperformance. The contributionsof thework include: • An automated pipeline on synthetic data gener- ation used for detection and recognition of both rigidanddeformableobjects. • A novel RGB based method for quick and ef- fortless acquisitionof objectmasks. •Weexplore theeffectofdeformabilityofanob- ject to itsdetectionperformance. 2.Relatedwork Computervisiontasksdependonlargeamountsof annotatedtrainingdata. Forthetasksofdetectingob- jectclassessuchascarsorairplanes therearenumer- ous hand-annotated datasets available: COCO [9], PASCAL VOC [3] and Open Image Dataset [7]. These datasets are built by researchers or companies andconsistofa largenumberof images. Each image has annotations of objects of interest. This may be a bounding box only or contain the mask of the object as well. The COCO (Common Objects in Context) dataset consists of over 330 thousand images con- taining objects that are split into 80 classes. How- ever, sometimes, especially in robotics related tasks, we are interested in detecting a specific object. For example not any mug but the user’s favourite coffee mug. Thementioneddatasetsareof littleuse in these cases, so there isanecessityforaspecializeddataset. Datasets are normally difficult to obtain so there is a lotof research concerningsynthesizingdatasets. JungwooHuhetal. [6]proposedamethodforsyn- thesizing training data that, similarly to ours, relies on obtaining masks of an object. In order to produce the synthetic images they use pure pasting, whereas we use a combination of pasting and Poisson image editing. Additionally they evaluate their method on rigid objects only, for example a baseball bat, a bot- tle, a toy rifle etc. The only deformable object that they use is an umbrella but they keep it closed dur- ing the training and testing so we can consider it as a rigid object in this case. Additionally, they use YOLOv2, which has a lower mAP (Mean Average Precision) than the YOLOv3 while also preserving the ability to process the images in real-time. For obtaining the masks of the objects they use a semi- automatic segmentation method while ours is fully automated and does not involve any manual post- processing. Debidatta Dwibedi et al. [2] assume that object images, which cover diverse viewpoints, are avail- able. They apply a CNN to obtain a mask of the object. They then randomly place the object into a scene image using Poisson cloning. Next, they train the Faster R-CNN [14] network using the syn- thetic images and evaluate the method on the GMU- Kitchens dataset [5]. For the evaluation of the method they also use exclusively rigid objects like bottles, detergents, cups, cornflakes packages etc. Although simple, the method achieves an mAP of 88%,which is similar towhatwereportondetection of rigidobjects. Georgakis et al. [4] propose a method for syn- thesizing training data that takes into consideration the geometry and semantic information of the scene. They use publicly available RGB-D datasets, the GMU-Kitchens [5] and the Washington Washington RGB-DScenesv2 [8], asbackgrounds for theobject images. Using RANSAC they detect planes in the image and artificially place objects on top of them, while also scaling their size according to the dis- tance from the camera. This method produces nat- ural looking images,because insteadofbeingplaced randomly in an image, the objects such as a cup or a bottle are placed on a flat desk surface or on the ground. They test their method using SSD and FasterR-CNN[14]andreportanmAPbetween70% and85%dependingonhowmuchrealdata theyuse. Consideringthefact that thescenes theyuseforeval- uationareclutteredthis isagoodresults. Theobjects used for evaluation are a bowl, a cup, a cereal box, a coffee mug and a soda can. These are all non de- formableobjects. 3.SyntheticDataGeneration Object detection is required in cases such as self- driving cars, unmanned aerial vehicles, robotics etc. Except for detecting rigid objects like cars, chairs or cups it is often needed to detect deformable ob- 132
zurück zum  Buch Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Titel
Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber
Graz University of Technology
Ort
Graz
Datum
2020
Sprache
englisch
Lizenz
CC BY 4.0
ISBN
978-3-85125-752-6
Abmessungen
21.0 x 29.7 cm
Seiten
188
Kategorien
Informatik
Technik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020