Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Page - 73 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 73 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 73 -

Image of the Page - 73 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text of the Page - 73 -

Figure 2: Detailed illustration of our end-to-end panoptic segmentation network with task interrela- tions. We internally merge predictions from our se- mantic and instance segmentation branches in a dif- ferentiable way. In particular, we concatenate stuff class predictions from our semantic segmentation branch with things class predictions in the form of canvas collections from our instance segmentation branch. Our instance canvas collections can also be transformed into an initial segmentation image (ISI) which serves as additional feature input for our se- mantic segmentationbranch. sistsof3×3convolutions,batchnormalization[10], ReLU [7], and2×bilinear upsampling. Because the individual stages have different spatial dimensions, we process each stage by a different number of up- sampling modules to generateH/4×W/4× 128 feature maps, whereH andW are the input image dimensions. The resulting outputs of all stages are concatenatedandprocessedusingafinal1×1convo- lution to reduce thechanneldimension to thedesired numberofclasses. For the instance segmentation branch, we imple- mented a Mask R-CNN [8]. We use a region pro- posal network to detect regions of interest, perform non-maximum suppression, execute ROI alignment, and predict 28× 28 binary masks as well as class probabilities for eachdetected instance. Inorder tocombinethesemanticandinstanceseg- mentation outputs, we use an internal differentiable fusion instead of external heuristics. For this pur- pose, we first select the most likely class label for eachdetected instanceusingadifferentiable soft argmax= N∑ i b e zi·β∑N k e zk·β e · i (1) operation [2], where N is the number of things classes,β is a large constant, and z is the predicted class logit. Usingβ in the exponent in combination with the round function allows us to squash all non- maxium values to zero. In this way, we approximate the non-differentiable argmax function, allowing us tobackpropagategradients. We then resize the predicted28×28mask logits for each detected instance according to its predicted 2D bounding box size and place them in empty can- vas layers at the predicted 2D location, as shown in Figure2(topright). Additionally,wemerge thecan- vas layers for regions of interest with the same class id and high mask IOU. The resulting canvas collec- tion from the instance segmentation branch is then concatenatedwith thestuff class logitsof theseman- ticsegmentationbranchtogenerateourpanopticout- put, as illustrated in Figure 2 (bottom). The pixel- wise panoptic segmentation output is attained by ap- plyingasoftmaxlayerontopof thestackedsemantic and instance segmentation information. The shape of the final output isH×W × (#stuff classes+ #detected instances). For stuff classes, the output is a class ID. For things classes, the output is an in- stance ID. The corresponding class ID for each in- stancecanbegatheredfromoursemanticor instance segmentationoutput. During training, it is important to reorder the de- tected instances to match the order of the ground truth instances. For this purpose, we use a ground truth instance IDlookup table. Allparametersofour networkareoptimized jointly. 3.2. Inter-taskRelations Ourdifferentiable fusionofsemanticand instance segmentation predictions allows us to join the out- puts of our two branches internally for end-to-end training. However, it also allows us to provide in- stance predictions as additional feature input to our semantic segmentationbranch,asshowninFigure3. 73
back to the  book Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Title
Joint Austrian Computer Vision and Robotics Workshop 2020
Editor
Graz University of Technology
Location
Graz
Date
2020
Language
English
License
CC BY 4.0
ISBN
978-3-85125-752-6
Size
21.0 x 29.7 cm
Pages
188
Categories
Informatik
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020