Seite - 100 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 100 -

Text der Seite - 100 -

mented images. The selected backbone model is the Inception v2 [5] network. This network is selected for its faster computation. 4.Evaluation To evaluate whether training with the augmented dataset isuseful, themodel trainedontheaugmented datamustbecomparedwith themodelnot trainedon thisdata. However, theintersectionoverUnion(IoU) measure isnotmeaningful in this case. Standard evaluation metrics such as the mean av- erage precision (mAP) define an IoU threshold (e.g. 0.5) and check whether a ground truth object and a detected object have an IoU value above this value. If this is the case, the detected object is defined as a True Positive (TP). If an object is detected but there is no respective ground truth with an IoU above this specific threshold, the detected object is defined as a False Positive (FP). If there is ground truth but no detected object with an IoU above the threshold, the object isdefinedasaFalseNegative (FN). Theseevaluationmethodscannotbeeasilyapplied to ground truth showing fragmented occlusion, be- causeof the following twoobservations: IoU too small: Since the data is based on frag- menteddetections,adetectorcanonlydetectpartsof the person. An image where this problem occurs is showninFigure3. TheboundingboxisclearlyaTP, based on the fact, that fragmented objects should be detected, but due to the occlusion by the branches of the tree, the whole body cannot be recognized. This leads to an IoU of only≈0.2. Multiple detections: Another major problem with the standard evaluation metrics is that exactly one detected bounding box and one ground truth bounding box match. However, when handling frag- mentedobjects,humanheadsand/orotherbodyparts should be detected separately if body parts are cov- ered. This creates the problem that parts of the body (like a head) is detected as well as the whole body. Figure 4showssomeexamples. To tackle these two problems, this paper proposes adifferent evaluationmetric. Foreachboundingbox in theevaluationdataset,wecalculate themaximum regionintheimagewherethereisnooverlapwithan- otherground truthboundingbox. This region is then extracted and fed into the model. If the model de- tects an object, we define it as TP, otherwise as FN. To assess FPs, we create an additional dataset that represents the maximum region in an image without Figure 3: Ground truth (green) and the detection (blue)varysubstantiallydue to theocclusioneffects. overlapwithanyground truthboundingbox. Weex- tracted in total45,340suchregionswithdifferentas- pect ratios, different parts of the image and at dif- ferent time instants. In addition to FPs, we can also calculate theTNsusing this evaluationmetrics. Figure5showstheseresultsas recallvs. precision curve (ROC). There is no significant difference be- tweenMaskR-CNNtrainedonMicrosoftCOCOand ontheaugmenteddatasetforL0occlusion. However, clear improvement has been achieved forL1 andL2 occlusion which proves the applicability of the idea tomodel fragmentedocclusionby themasks. Never- theless, all approaches basically do not reach the ex- pected robustness and accuracy for moderateL2 and heavyL3 occlusion. One reason for this is that our current technique is not accurate enough to model fragmented occlusion. Furthermore, clear limits ex- istasheavyfragmentedocclusionremoves localspa- tial and structural information necessary for current approaches inobjectdetection. We further recognise that bounding box labelling is not the appropriate approach for labelling data showing fragmented occlusion. Especially for L3 andL4occlusion, it is frequently impossible toman- uallydefine theboundingbox. Suchocclusion levels allowanapproximate localisationof theobject in the imagebutmaketheobservationof theobject’sextent impossible. While therecall inFigure5isstillmean- ingful, the precision is basically undefined. This ob- servation has severe consequences on the labelling, 100

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik