Page - 131 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 131 -

Text of the Page - 131 -

TheDifficultiesofDetectingDeformable ObjectsUsingDeepNeural Networks NikolaDjukic,MarkusVincze AutomationandControl Institute,TU Wien,Vienna,Austria {dukic,vincze}@acin.tuwien.ac.at WalterG.Kropatsch PatternRecognitionand ImageProcessingGroup,TUWien,Vienna,Austria krw@prip.tuwien.ac.at Abstract. Object detectors based on deep neural networks have revolutionized the way we look for objects in an image, outperforming traditional im- age processing techniques. These detectors are of- ten trained on huge datasets of labelled images and are used to detect objects of different classes. We ex- plore how they perform at detecting custom objects and show how shape and deformability of an object affect the detection performance. We propose an au- tomated method for synthesizing the training images and target the real-time scenario using YOLOv3 as the baseline for object detection. We show that rigid objects have a high chance of being detected with an AP (average precision) of 87.38%. Slightly de- formable objects like scissors and headphones show adrop indetectionperformancewithprecisionaver- aging at 49.54%. Highly deformable objects like a chain or earphones show an even further drop in AP to26.58%. 1. Introduction Objectdetection inRGBimageshasreceiveda lot of attention in the previous years due to advances in deep neural networks (DNN) research. Classi- cal techniques usually rely on searching for features in an image that were hand-crafted by a human. Deep neural networks on the other hand use huge datasets of hand-labelled images to learn these fea- tures. These labels are either a bounding box of an objector itsmask. Thisapproachhasshowngreatef- ficiency. Ingeneral therearetwotypesofDNNbased object detectors. The first group performs the detec- tion in a single run through a network. These meth- ods are generally fast and can even run in real-time Figure 1.Objects used forevaluation withstandardhardware. Secondgrouphasaseparate region proposal and detection stage, which usually makes the execution of the methods slower but more precise than the first group of methods. Recently, a combination of CBNet and Cascade R-CNN has achieved a new state of the art result on the COCO dataset [9]withan impressiveAP50of71.9%. [10] Detecting custom objects is a common problem in robotics. DNN or more precisely Convolutional Neural Networks (CNN) require large amounts of data for training. Having that data hand-labelled by a human is extremely time consuming so there is a lot of research going on in the field of synthesizing training data. This is typically done by first making a 3D reconstruction of the objects and then placing them in a virtual environment which allows the sim- ulation of artificial deformations and the creation of arbitrarysyntheticviewswhere labelsare takenfrom the3Dtemplate. However,obtainingafull3Drecon- struction is not possible with all objects, especially 131

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik