Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Page - 100 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 100 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 100 -

Image of the Page - 100 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text of the Page - 100 -

mented images. The selected backbone model is the Inception v2 [5] network. This network is selected for its faster computation. 4.Evaluation To evaluate whether training with the augmented dataset isuseful, themodel trainedontheaugmented datamustbecomparedwith themodelnot trainedon thisdata. However, theintersectionoverUnion(IoU) measure isnotmeaningful in this case. Standard evaluation metrics such as the mean av- erage precision (mAP) define an IoU threshold (e.g. 0.5) and check whether a ground truth object and a detected object have an IoU value above this value. If this is the case, the detected object is defined as a True Positive (TP). If an object is detected but there is no respective ground truth with an IoU above this specific threshold, the detected object is defined as a False Positive (FP). If there is ground truth but no detected object with an IoU above the threshold, the object isdefinedasaFalseNegative (FN). Theseevaluationmethodscannotbeeasilyapplied to ground truth showing fragmented occlusion, be- causeof the following twoobservations: IoU too small: Since the data is based on frag- menteddetections,adetectorcanonlydetectpartsof the person. An image where this problem occurs is showninFigure3. TheboundingboxisclearlyaTP, based on the fact, that fragmented objects should be detected, but due to the occlusion by the branches of the tree, the whole body cannot be recognized. This leads to an IoU of only≈0.2. Multiple detections: Another major problem with the standard evaluation metrics is that exactly one detected bounding box and one ground truth bounding box match. However, when handling frag- mentedobjects,humanheadsand/orotherbodyparts should be detected separately if body parts are cov- ered. This creates the problem that parts of the body (like a head) is detected as well as the whole body. Figure 4showssomeexamples. To tackle these two problems, this paper proposes adifferent evaluationmetric. Foreachboundingbox in theevaluationdataset,wecalculate themaximum regionintheimagewherethereisnooverlapwithan- otherground truthboundingbox. This region is then extracted and fed into the model. If the model de- tects an object, we define it as TP, otherwise as FN. To assess FPs, we create an additional dataset that represents the maximum region in an image without Figure 3: Ground truth (green) and the detection (blue)varysubstantiallydue to theocclusioneffects. overlapwithanyground truthboundingbox. Weex- tracted in total45,340suchregionswithdifferentas- pect ratios, different parts of the image and at dif- ferent time instants. In addition to FPs, we can also calculate theTNsusing this evaluationmetrics. Figure5showstheseresultsas recallvs. precision curve (ROC). There is no significant difference be- tweenMaskR-CNNtrainedonMicrosoftCOCOand ontheaugmenteddatasetforL0occlusion. However, clear improvement has been achieved forL1 andL2 occlusion which proves the applicability of the idea tomodel fragmentedocclusionby themasks. Never- theless, all approaches basically do not reach the ex- pected robustness and accuracy for moderateL2 and heavyL3 occlusion. One reason for this is that our current technique is not accurate enough to model fragmented occlusion. Furthermore, clear limits ex- istasheavyfragmentedocclusionremoves localspa- tial and structural information necessary for current approaches inobjectdetection. We further recognise that bounding box labelling is not the appropriate approach for labelling data showing fragmented occlusion. Especially for L3 andL4occlusion, it is frequently impossible toman- uallydefine theboundingbox. Suchocclusion levels allowanapproximate localisationof theobject in the imagebutmaketheobservationof theobject’sextent impossible. While therecall inFigure5isstillmean- ingful, the precision is basically undefined. This ob- servation has severe consequences on the labelling, 100
back to the  book Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Title
Joint Austrian Computer Vision and Robotics Workshop 2020
Editor
Graz University of Technology
Location
Graz
Date
2020
Language
English
License
CC BY 4.0
ISBN
978-3-85125-752-6
Size
21.0 x 29.7 cm
Pages
188
Categories
Informatik
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020