Seite - 82 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 82 -
Text der Seite - 82 -
malizeby thesumofpixelswhicharevalidandcon-
sistent in the ground truth. This is achievedby
ωnorm(S)=
T−1,M,N∑
t,m,n=1
δ(St,m,n∈S) ·δ(St,m,n=St+1,m,n),
(4)
where the first factor checks for validity and the sec-
ondoneconsistencyinthegroundtruth. Theboolean
function ωvcc(·) ensures that only valid, consistent
and correctly (vcc) predicted pixels are affected by
the following loss term.
ωvcc(S,P,t,m,n)=δ(St,m,n∈S) · (5)
δ(St,m,n=St+1,m,n) ·
ψ(St,m,n,Pt,m,n,St+1,m,n,Pt+1,m,n),
where the first factor ensures validity, the second
consistencyand the thirdcorrectprediction inoneof
two consecutive images. The third factor is given by
the boolean functionψ(·)whichwedefineas
ψ(s1,p1,s2,p2)=min(δ(s1=argmax(p1))+
δ(s2=argmax(p2)),1). (6)
This function determines for a pixel at a certain po-
sition ifat leastoneprediction in theconsecutive im-
agepair iscorrect. Theinputparametersaregivenby
the two prediction vectorsp1,p2∈R|S| and the two
ground truth labelss1,s2∈S for any pixel position.
All fourparameters are retrieved fromPandS.
In Figure 3 we point out pixels which are affected
by the inconsistency loss. In the bottom right of the
prediction, theroad(purple) is labeledinconsistently.
For these pixels the functionωvcc(·) returns true and
theyarepenalizedby the inconsistency loss.
4.Experiments
First,weexplain thegenerationof semanticvideo
data with ground truth and show the impact of syn-
thetic data. Second, we evaluate our proposed meth-
ods, i.e. the feature propagation and the inconsis-
tency loss.
Architectures and data preparation We use two
models in our experiments, the ESPNet [19] and the
SSNet. We train the models on images with half
and quarter Cityscapes resolution to reduce compu-
tational complexity. Comparisons between different
configurations are always trained for the same num-
ber of epochs which is chosen high enough to allow
for convergence of the configurations. We generate
the pseudo ground truth for the sequence validation
setwith the DeeplabXceptionmodel [3]. Prediction Ground truth
(a) Prediction inconsistency (b) DilatedGT scene change
Figure 3: Visualization of Inconsistencies. We compare
prediction and ground truth at two different time steps.
Thewhitepixels in image(a)are inconsistentlypredicted.
Image (b) shows pixels which change their label because
of motion. Only black pixels in image (b) are affected by
our inconsistency loss.
Metrics and abbreviations The metrics which we
use to compare our experiments are mean intersec-
tionoverunion(mIoU↑), thepercentageofcorrectly
classifiedvalidpixels (Acc↑), thepercentageof tem-
poral consistently classified pixels (Cons ↑) and the
percentage of pixels which are temporally consistent
but wrongly classified (ConsW↓). The arrow point-
ing upwards↑ indicates that a higher value is better,
whereas the arrow pointing downwards ↓ indicates
theopposite. OurConsandConsWmetricscheckall
pixels which need to have the same label according
to theground truth, i.e. blackpixels inFigure3b.
4.1.DataGeneration
Animportantpartofourwork is thegenerationof
ground truth for a video data set.We generate street
scenevideodatawithapre-trainedDeeplabXception
model [3] and theCarla simulator [7].
Real world data The semantic segmentation data
sets of CamVid [1], Kitti [9], Cityscapes [6] and
Mapillary [20] do not provide ground truth for video
data because of the large labeling effort required.
Therefore, we use the Deeplab Xception model pre-
trainedon theCityscapesdata set togeneratepseudo
ground truth labels for the Cityscapes sequence data
set. Thereasonwhyweprefer theCityscapesdataset
forvideoprocessing is thatevery20th imageofeach
82
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik