Seite - 74 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 74 -
Text der Seite - 74 -
For this purpose, we first evaluate our instance
segmentation branch and build an instance canvas
collection as described in Sec. 3.1. Next, we merge
canvas layers of instances that belong to the same
class using weighted average and insert empty can-
vas layers for missing or undetected classes. In this
way,wegeneratean initial segmentation image (ISI)
which represents a coarse semantic segmentation for
thingsclasses.
To exploit this segmentation prior in our seman-
tic segmentation branch, we downsample our ISI to
H/4×W/4×# thingsclassesandconcatenateitwith
the output of our semantic segmentation upsampling
modules, as shown in Figure 3. Next, we apply four
networkblocksconsistingof3×3convolution,batch
normalization, and ReLU followed by a single1×1
convolution, batch normalization, and ReLU block
to reduce the channel dimension to the number of
classes. Finally, we use bilinear upsampling to ob-
tainsemanticsegmentationlogitsat theoriginal input
imagedimensionsandapplyasoftmaxnon-linearity.
Byexploiting thesegmentationpriorgivenbyISI,
the upsampling modules of our semantic segmen-
tation branch focus more on the prediction of stuff
classesandboundariesbetweenindividualclassesin-
stead of things classes. This is a huge advantage
compared to disjoint semantic and instance segmen-
tation branches where redundant predictions are per-
formed in the semantic segmentation branch. As a
consequence, this link between the individual tasks
increases thepanopticperformanceof our system.
4.ExperimentalResults
To demonstrate the benefits of our end-to-end
panoptic architecture with interrelations, we evalu-
ate it on the challenging Cityscapes dataset [4] for
semantic understanding of urban street scenes. We
follow the protocol of [4] and train and evaluate on
19classes(11stuff and8 things). Weusetherecently
introduced panoptic quality [11] metric to assess the
segmentationperformance.
4.1.ExperimentalSetup
Due to our limited computational resources, we
limited the maximum number of instances per im-
age to 30 and excluded samples with more instances
from the evaluation. In this way, we use 2649 of
2975 training images (≈ 89%) and 415 of 500 pub-
licly available validation images (≈ 83%). Addi-
tionally, we reduce the spatial image resolution from Figure 3: Illustration of our proposed semantic and
instance segmentation branches with inter-task re-
lations. We first run the instance segmentation
branch and then provide instance segmentation pre-
dictions as additional feature input to the seman-
tic segmentation branch via an initial segmentation
image (ISI). Finally, we evaluate the semantic seg-
mentation branch and exploit the segmentation prior
given by ISI to improve the overall panoptic perfor-
mance.
2048× 1024 to 1024× 512. For this reason, we
cannot not benchmark against other state-of-the-art
approaches. To provide an unbiased evaluation, we
comparefourdifferentapproacheswithanincreasing
levelofentanglementbetweensemanticandinstance
segmentation. All methods use the same backbone,
trainingprotocol, andhyper-parameters:
Semantic+Instance.Thisapproachusestwodif-
ferentnetworksbasedonaResNet-101[9]backbone
which independently perform semantic and instance
segmentation. A heuristic is used to combine the in-
dividual results.
Panoptic FPN. This method is a reimplementa-
tionofPanopticFeaturePyramidNetworks[11]with
a ResNet-101 [9] backbone. In contrast to Semantic
+ Instance, the semantic and instance segmenation
branches use a single shared feature representation.
The results, however, are stillmergedheuristically.
HPS. Our holistic panoptic segmentation net-
work (HPS) extends Panoptic FPN as described in
Sec. 3.1. Our network internally builds the panoptic
segmentation output using differentiable operations
whichenablesus tooptimize for thefinalobjective.
HPS+ISI.This method augments our HPS with
inter-task relations between the semantic and in-
74
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik