Seite - 74 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 74 -

Text der Seite - 74 -

For this purpose, we first evaluate our instance segmentation branch and build an instance canvas collection as described in Sec. 3.1. Next, we merge canvas layers of instances that belong to the same class using weighted average and insert empty can- vas layers for missing or undetected classes. In this way,wegeneratean initial segmentation image (ISI) which represents a coarse semantic segmentation for thingsclasses. To exploit this segmentation prior in our seman- tic segmentation branch, we downsample our ISI to H/4×W/4×# thingsclassesandconcatenateitwith the output of our semantic segmentation upsampling modules, as shown in Figure 3. Next, we apply four networkblocksconsistingof3×3convolution,batch normalization, and ReLU followed by a single1×1 convolution, batch normalization, and ReLU block to reduce the channel dimension to the number of classes. Finally, we use bilinear upsampling to ob- tainsemanticsegmentationlogitsat theoriginal input imagedimensionsandapplyasoftmaxnon-linearity. Byexploiting thesegmentationpriorgivenbyISI, the upsampling modules of our semantic segmen- tation branch focus more on the prediction of stuff classesandboundariesbetweenindividualclassesin- stead of things classes. This is a huge advantage compared to disjoint semantic and instance segmen- tation branches where redundant predictions are per- formed in the semantic segmentation branch. As a consequence, this link between the individual tasks increases thepanopticperformanceof our system. 4.ExperimentalResults To demonstrate the benefits of our end-to-end panoptic architecture with interrelations, we evalu- ate it on the challenging Cityscapes dataset [4] for semantic understanding of urban street scenes. We follow the protocol of [4] and train and evaluate on 19classes(11stuff and8 things). Weusetherecently introduced panoptic quality [11] metric to assess the segmentationperformance. 4.1.ExperimentalSetup Due to our limited computational resources, we limited the maximum number of instances per im- age to 30 and excluded samples with more instances from the evaluation. In this way, we use 2649 of 2975 training images (≈ 89%) and 415 of 500 pub- licly available validation images (≈ 83%). Addi- tionally, we reduce the spatial image resolution from Figure 3: Illustration of our proposed semantic and instance segmentation branches with inter-task re- lations. We first run the instance segmentation branch and then provide instance segmentation pre- dictions as additional feature input to the seman- tic segmentation branch via an initial segmentation image (ISI). Finally, we evaluate the semantic seg- mentation branch and exploit the segmentation prior given by ISI to improve the overall panoptic perfor- mance. 2048× 1024 to 1024× 512. For this reason, we cannot not benchmark against other state-of-the-art approaches. To provide an unbiased evaluation, we comparefourdifferentapproacheswithanincreasing levelofentanglementbetweensemanticandinstance segmentation. All methods use the same backbone, trainingprotocol, andhyper-parameters: Semantic+Instance.Thisapproachusestwodif- ferentnetworksbasedonaResNet-101[9]backbone which independently perform semantic and instance segmentation. A heuristic is used to combine the in- dividual results. Panoptic FPN. This method is a reimplementa- tionofPanopticFeaturePyramidNetworks[11]with a ResNet-101 [9] backbone. In contrast to Semantic + Instance, the semantic and instance segmenation branches use a single shared feature representation. The results, however, are stillmergedheuristically. HPS. Our holistic panoptic segmentation net- work (HPS) extends Panoptic FPN as described in Sec. 3.1. Our network internally builds the panoptic segmentation output using differentiable operations whichenablesus tooptimize for thefinalobjective. HPS+ISI.This method augments our HPS with inter-task relations between the semantic and in- 74

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik