Seite - 82 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 82 -

Text der Seite - 82 -

malizeby thesumofpixelswhicharevalidandcon- sistent in the ground truth. This is achievedby ωnorm(S)= T−1,M,N∑ t,m,n=1 δ(St,m,n∈S) ·δ(St,m,n=St+1,m,n), (4) where the first factor checks for validity and the sec- ondoneconsistencyinthegroundtruth. Theboolean function ωvcc(·) ensures that only valid, consistent and correctly (vcc) predicted pixels are affected by the following loss term. ωvcc(S,P,t,m,n)=δ(St,m,n∈S) · (5) δ(St,m,n=St+1,m,n) · ψ(St,m,n,Pt,m,n,St+1,m,n,Pt+1,m,n), where the first factor ensures validity, the second consistencyand the thirdcorrectprediction inoneof two consecutive images. The third factor is given by the boolean functionψ(·)whichwedefineas ψ(s1,p1,s2,p2)=min(δ(s1=argmax(p1))+ δ(s2=argmax(p2)),1). (6) This function determines for a pixel at a certain po- sition ifat leastoneprediction in theconsecutive im- agepair iscorrect. Theinputparametersaregivenby the two prediction vectorsp1,p2∈R|S| and the two ground truth labelss1,s2∈S for any pixel position. All fourparameters are retrieved fromPandS. In Figure 3 we point out pixels which are affected by the inconsistency loss. In the bottom right of the prediction, theroad(purple) is labeledinconsistently. For these pixels the functionωvcc(·) returns true and theyarepenalizedby the inconsistency loss. 4.Experiments First,weexplain thegenerationof semanticvideo data with ground truth and show the impact of syn- thetic data. Second, we evaluate our proposed meth- ods, i.e. the feature propagation and the inconsis- tency loss. Architectures and data preparation We use two models in our experiments, the ESPNet [19] and the SSNet. We train the models on images with half and quarter Cityscapes resolution to reduce compu- tational complexity. Comparisons between different configurations are always trained for the same num- ber of epochs which is chosen high enough to allow for convergence of the configurations. We generate the pseudo ground truth for the sequence validation setwith the DeeplabXceptionmodel [3]. Prediction Ground truth (a) Prediction inconsistency (b) DilatedGT scene change Figure 3: Visualization of Inconsistencies. We compare prediction and ground truth at two different time steps. Thewhitepixels in image(a)are inconsistentlypredicted. Image (b) shows pixels which change their label because of motion. Only black pixels in image (b) are affected by our inconsistency loss. Metrics and abbreviations The metrics which we use to compare our experiments are mean intersec- tionoverunion(mIoU↑), thepercentageofcorrectly classifiedvalidpixels (Acc↑), thepercentageof tem- poral consistently classified pixels (Cons ↑) and the percentage of pixels which are temporally consistent but wrongly classified (ConsW↓). The arrow point- ing upwards↑ indicates that a higher value is better, whereas the arrow pointing downwards ↓ indicates theopposite. OurConsandConsWmetricscheckall pixels which need to have the same label according to theground truth, i.e. blackpixels inFigure3b. 4.1.DataGeneration Animportantpartofourwork is thegenerationof ground truth for a video data set.We generate street scenevideodatawithapre-trainedDeeplabXception model [3] and theCarla simulator [7]. Real world data The semantic segmentation data sets of CamVid [1], Kitti [9], Cityscapes [6] and Mapillary [20] do not provide ground truth for video data because of the large labeling effort required. Therefore, we use the Deeplab Xception model pre- trainedon theCityscapesdata set togeneratepseudo ground truth labels for the Cityscapes sequence data set. Thereasonwhyweprefer theCityscapesdataset forvideoprocessing is thatevery20th imageofeach 82

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik