Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Page - 84 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 84 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 84 -

Image of the Page - 84 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text of the Page - 84 -

Frame1 Frame2 Frame3 Frame4 Frame5 Figure 4: Qualitative Results. A comparison between input data, DeepLab Xception ground-truth, single frame training andLSTMtrainingon theESPNet (top tobottom). Thehorizontal axis represents the timesteps. Areaswith inconsistent predictions are shown in detail and highlighted with green dashed boxes. Other inconsistencies are highlighted with orange boxes. The ESPNet with single frame training (Sgl Train) produces inconsistencies in the right, left and on the roadsegmentation. The ESPNet L1bpredicts significantlymoreaccurateandconsistent results. similar results. We observe that the hyper-parameter λincons = 10 provides a good trade-off between ac- curacy and consistency when using the squared dif- ference loss function. The increase in consistency by 0.4 percentage points is noticeable when compar- ing the qualitative results. We set the other hyper- parameterλce=1 for all ofourexperiments. Combining the findings In order to achieve the best results with ESPNet L1b, we train the model in multiple phases. We use the squared difference in- consistency loss on correctly predicted classes with λincons = 10 and a 5× 5 convolution inside the ConvLSTM. The quantitative results are shown at the bottom of Table 2. When training with the weighted cross entropy loss and data augmentations as proposed in [19] the official Cityscapes server re- ports 60.9% mIoU on the single frame test set. Our method reaches slightly higher accuracy and signifi- cantlybetter temporalconsistencywhileusingasim- ilarnumberofparameters asMethaetal. [19]. 5.Conclusion We have shown that we can improve temporal consistency and accuracy of semantic segmentation for twodifferentsingleframearchitecturesbyadding feature propagation and a novel inconsistency loss. OntheESPNet,consistencyandmIoUimprovefrom 95.5 to 98.7% and from 45.2 to 57.9%, respectively. This is equal to a reduction of inconsistencies by 71.1% which can be observed immediately when watchingavideosequence. Moreover, we found that it is best to forward fea- tures at a high level with a standard convolution within the ConvLSTM cell. The hyper-parameter in our novel inconsistency loss function can be used to prioritize between consistency and accuracy. We alsoimproveconsistencyslightlybyaddingsynthetic datageneratedby theCarla simulator. In future experiments we are interested in com- paringothermethodsofadding the informationfrom past framestothecurrentprediction. Wealsoneedto generate synthetic data such that it contains seman- tics of all validation classes to increase overall con- sistencyandaccuracy. References [1] G. J. Brostow, J. Fauqueur, and R. Cipolla. Seman- tic object classes in video: A high-definition ground 84
back to the  book Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Title
Joint Austrian Computer Vision and Robotics Workshop 2020
Editor
Graz University of Technology
Location
Graz
Date
2020
Language
English
License
CC BY 4.0
ISBN
978-3-85125-752-6
Size
21.0 x 29.7 cm
Pages
188
Categories
Informatik
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020