Seite - 99 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 99 -
Text der Seite - 99 -
object detection. Fragmented occlusion occurs by
viewing objects behind tree ans bush leaves. Con-
trary topartialocclusion, fragmentedocclusiongives
no clear view on minimal recognisable parts of the
object [10]which isused todetect theobject [7].
We show in this work that the state-of-the-art in
object detection fails on fragmented occlusion even
for the moderate case. For this, we created a new
dataset (Figure 1) capturing people behind trees. We
labellednearly40,000 images in three representative
videos. This data raises new challenges on the la-
belling and evaluation which we only partially an-
swer in this paper. For example, bounding boxes
are thestandard incurrentevaluationofdetectorsbut
such labelsarehard tofind indata thatcontains frag-
mented occlusion. As the state-of-the-art detectors
deliver bounding boxes, fragmented occlusion poses
new questionson theevaluationmethodology.
Furthermore, we augmented Microsoft COCO1
training data by occluding the ground truth masks
similarlyas leavesoccludepeoplebehindbushesand
trees. We then show results on training Mask R-
CNN [4] on this new data showing improvement of
MaskR-CNNtrainedon theoriginaldatawithslight
fragmented occlusion.
2.RelatedWork
State-of-the-art object detection is based on deep
learning. Two-stage detectors work by finding as an
intermediate step bounding box proposals [3, 2] on
the feature maps of the backbone CNN. A region
proposal network further improves efficiency [9, 4].
One-stage detectors regress the bounding boxes di-
rectly [8, 6] which is computationally efficient on
GPUs but this approach is inherently less accurate
as it assumesacoarselydiscretisedsearchspace. Al-
though thesemethods showusuallyexcellentperfor-
mance for fully visible objects, they break down in
the case of fragmented occlusion. Fragmented oc-
clusion has not been considered for object detection
so far, however there is literature about this topic in
the field ofmotionanalysis [1].
3.Methodology
We created a dataset recorded in a forest consist-
ing of three videos with a total of 18,360 frames and
33,933boundingboxeswhichweremanuallydefined
by human annotators. These bounding boxes are di-
1http://cocodataset.org Figure 2: A training image from Microsoft COCO
(http://images.cocodataset.org/
train2017/000000001700.jpg). Top Left:
the image. Top Right: Segmentation mask of the
image. Bottom Left: image overlaid with artificial
trees. BottomRight: Maskof theoverlaid image.
vided into four different occlusion levels including
theunoccludedcase (Figure1).
Then, we extended the Microsoft COCO dataset
by adding artificial trees as foreground to the im-
ages of objects (Figure 2). We chose this dataset,
becauseitcontainspixel-wisesegmentationmasksin
thegroundtruthaswellasalargenumberofdifferent
categories including thehumanperson.
The underlying basic idea of our approach is
to add artificial fragmented occlusion to Microsoft
COCO and train Mask R-CNN on this new data.
By this we can adapt the original distribution of
data to the case of fragmentally occluded objects.
Since we are only interested in humans, we apply
thisaugmentationonly to imagescontaininghumans
and use only these images for training. The trees
used for the augmentation are generated from real
images we have obtained from the test data. The
method generates whole artificial trees by randomly
adding branches to previously manually segmented
tree trunks. In total14suchtrunksareextractedfrom
thetestdataset. Thebranchesattachedtothesetrunks
are also randomly generated by also adding a few
manually segmented leaves.
The trees are placed in front of objects by ran-
domly selecting the x-coordinate on which they will
be placed and an angle at which the tree will be ro-
tated. Thecalculatedforegroundisapplied to the im-
age and its negative mask is multiplied by the seg-
mentation mask of the objects in the image. The
Mask R-CNN model is then trained with the aug-
99
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik