Page - 131 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 131 -
Text of the Page - 131 -
TheDifficultiesofDetectingDeformable ObjectsUsingDeepNeural Networks
NikolaDjukic,MarkusVincze
AutomationandControl Institute,TU Wien,Vienna,Austria
{dukic,vincze}@acin.tuwien.ac.at
WalterG.Kropatsch
PatternRecognitionand ImageProcessingGroup,TUWien,Vienna,Austria
krw@prip.tuwien.ac.at
Abstract. Object detectors based on deep neural
networks have revolutionized the way we look for
objects in an image, outperforming traditional im-
age processing techniques. These detectors are of-
ten trained on huge datasets of labelled images and
are used to detect objects of different classes. We ex-
plore how they perform at detecting custom objects
and show how shape and deformability of an object
affect the detection performance. We propose an au-
tomated method for synthesizing the training images
and target the real-time scenario using YOLOv3 as
the baseline for object detection. We show that rigid
objects have a high chance of being detected with
an AP (average precision) of 87.38%. Slightly de-
formable objects like scissors and headphones show
adrop indetectionperformancewithprecisionaver-
aging at 49.54%. Highly deformable objects like a
chain or earphones show an even further drop in AP
to26.58%.
1. Introduction
Objectdetection inRGBimageshasreceiveda lot
of attention in the previous years due to advances
in deep neural networks (DNN) research. Classi-
cal techniques usually rely on searching for features
in an image that were hand-crafted by a human.
Deep neural networks on the other hand use huge
datasets of hand-labelled images to learn these fea-
tures. These labels are either a bounding box of an
objector itsmask. Thisapproachhasshowngreatef-
ficiency. Ingeneral therearetwotypesofDNNbased
object detectors. The first group performs the detec-
tion in a single run through a network. These meth-
ods are generally fast and can even run in real-time Figure 1.Objects used forevaluation
withstandardhardware. Secondgrouphasaseparate
region proposal and detection stage, which usually
makes the execution of the methods slower but more
precise than the first group of methods. Recently,
a combination of CBNet and Cascade R-CNN has
achieved a new state of the art result on the COCO
dataset [9]withan impressiveAP50of71.9%. [10]
Detecting custom objects is a common problem
in robotics. DNN or more precisely Convolutional
Neural Networks (CNN) require large amounts of
data for training. Having that data hand-labelled by
a human is extremely time consuming so there is a
lot of research going on in the field of synthesizing
training data. This is typically done by first making
a 3D reconstruction of the objects and then placing
them in a virtual environment which allows the sim-
ulation of artificial deformations and the creation of
arbitrarysyntheticviewswhere labelsare takenfrom
the3Dtemplate. However,obtainingafull3Drecon-
struction is not possible with all objects, especially
131
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik