Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Page - 79 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 79 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 79 -

Image of the Page - 79 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text of the Page - 79 -

Frame-To-Frame ConsistentSemanticSegmentation ManuelRebol PatrickKno¨belreiter GrazUniversityofTechnology rebol@student.tugraz.at, knoebelreiter@icg.tugraz.at Abstract. In this work, we aim for temporally con- sistentsemanticsegmentationthroughout framesina video. Many semantic segmentation algorithms pro- cess images individually which leads to an inconsis- tentsceneinterpretationduetoilluminationchanges, occlusionsandothervariationsovertime. Toachieve a temporally consistent prediction, we train a con- volutional neural network (CNN) which propagates features through consecutive frames in a video us- ingaconvolutional longshort termmemory(ConvL- STM)cell. Besidesthetemporal featurepropagation, we penalize inconsistencies in our loss function. We show in our experiments that the performance im- proveswhenutilizingvideo informationcompared to single frame prediction. The mean intersection over union(mIoU)metricontheCityscapesvalidationset increases from 45.2% for the single frames to 57.9% for video data after implementing the ConvLSTM to propagate features trough time on the ESPNet. Most importantly, inconsistency decreases from 4.5% to 1.3% which is a reduction by 71.1%. Our results indicate that the added temporal information pro- ducesaframe-to-frameconsistentandmoreaccurate image understanding compared to single frame pro- cessing. 1. Introduction We address the task of semantic segmentation which assigns a semantic class for each pixel in an image. Our focus is on the computation of seman- ticsegmentationformultipleconsecutive images, re- ferredtoasframes, inavideosequence. Consecutive video frames contain similar information, because they capture a scene which only changes slightly. Therefore, the semantic segmentationofconsecutive frames is similar as long as motion between frames does not increase significantly. For example, con- sider a street scene recorded by a camera mounted Frame1 Frame2 Figure 1: Consistent Semantic Segmentation. The trained ESPNet [19] model predicts temporally inconsistent se- mantic segmentation on two consecutive frames of the Cityscapes [6] video data set (second row). The semantic segmentation is color encoded and large inconsistencies are highlighted with orange boxes. The third row shows consistent resultspredictedbyourmodel. Wereducetem- poral inconsistenciesby71%. on a vehicle in which we observe a street sign. If the framerate is largeenough,wewillobserve thestreet sign in multiple images as the vehicle passes by. In this example, the goal of this work would be to con- sistently detect the street sign as such in all frames in which the sign appears. Single frame algorithms often fail at achieving this task. In general, we aim for temporally consistent segmentation of all seman- tic classes throughout avideosequence. Many state of the art computer vision algorithms process images individually[26,17,3]andhenceare not designed for video sequences. They do not con- sider the temporal dependencies which occur when segmenting a video semantically. If single frame convolutional neural networks (CNNs) predict se- mantic segmentationonvideosequences, results can becometemporally inconsistentbecauseof illumina- 79
back to the  book Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Title
Joint Austrian Computer Vision and Robotics Workshop 2020
Editor
Graz University of Technology
Location
Graz
Date
2020
Language
English
License
CC BY 4.0
ISBN
978-3-85125-752-6
Size
21.0 x 29.7 cm
Pages
188
Categories
Informatik
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020