Page - 29 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Image of the Page - 29 -
Text of the Page - 29 -
and classification, while remaining independent of the underlying machine learning algorithm. The
method and its integration throughout the entire processing pipeline is described in this chapter and
demonstratedon theDaimlerUrbanSegmentation2014dataset [14].
Thedataset consistsof imagesequencescapturedbyacameramountedonamovingcar. The images
are provided without color information at a resolution of 1024x440 px, with every 10th frame of the
sequences being annotated with pixel-wise segmentations. For a reasonable comparison, only the
test sequences, as specified by the evaluation protocol, are considered. The dataset is supplemented
with precomputed disparity maps and additional information, like time-stamps, vehicle speed and
yaw rate. The ground truth distinguishes between two foreground (Vehicle and Pedestrian) and three
background classes (Ground, Sky and Building). Within the test data 36.3% of all pixels are defined
as Void. The frequency of occurrence of the labeled pixels is 54.1% for Ground, 14.8% for Vehicle,
4.6%for Pedestrian, 2.4%for Sky and 24.0%for Building, resulting inabackground ratioof80.6%.
3.1. Training
Dataset Analysis As a preliminary step for the training and classification process, an appropriate
choice of input data with regard to the intended application scenario is a decisive aspect. For this
purpose, a statistical analysisofmultipledatasetswasconductedaccording to theconceptofExplicit
Priors. The resulting data ranges from basic statistics, such as label frequency and the ratio of back-
ground to foreground classes, to more sophisticated aspects concerning occurrence distribution and
spatial context. For each application scenario, this dataset analysis can be used to select a subset of
additional cues for identifyingappropriatedatasets. For thedemonstrated task, for instance, themost
useful information was provided by the concept of Location Bins. By dividing the image dimensions
into a coarse grid and capturing the spatial distribution of each class across the resulting cells over
the entire dataset, probabilities for the occurrence of certain labels with regard to their location can
be derived. The resulting representation provides clearly arranged patterns closely related to certain
characteristicsof thedataset, suchas themethodof imageacquisition. In thecaseofVehicles, for in-
stance, theanalysisclearlyshowedthat images takenwithahand-heldcameraaremostlycenteredon
the theseobjects,while for thedatasetsusingacameramountedonacar theyaremostoften found in
the lowerhalfof the image. Comparing these statistics for candidate trainingdatasets to the intended
applicationscenario facilitates theevaluation of their compatibility.
Other available statistical measures proved to add less distinct cues for the given task, such as the
analysis of co-occurrence, which provides a measure of probability for each combination of labels to
appear in the same image. Since the application scenario only includes five labels arranged within a
consecutiveimagesequence, theresultingcorrelationmatrixdidnotshowsignificantpeaks. However,
an adapted version in the form of Local Label Neighborhood (LLN), which limits the co-occurance
measure to label transitions,was successfully applied, asdescribed indetail inSection3.2.
Based on the aggregated information of class frequency and Location Bins, the CamVid dataset [1]
could be identified as an appropriate choice for training background classes, since it offers a back-
ground ratio of 80.9%, as well as a fitting spatial arrangement of class probabilities. The foreground
classes, on the other hand, are trained on the PascalContext dataset [11], in particular the version
including33categories, whichcontains46%foreground pixels.
ClassifierSetup Basedontheselecteddatasets, twoclassifiersareapplied tocover thebackground
and foreground classes separately. The former classifier uses thepre-trained modelpascal-fcn8s-tvg-
29
Proceedings
OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Title
- Proceedings
- Subtitle
- OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Authors
- Peter M. Roth
- Kurt Niel
- Publisher
- Verlag der Technischen Universität Graz
- Location
- Wels
- Date
- 2017
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-527-0
- Size
- 21.0 x 29.7 cm
- Pages
- 248
- Keywords
- Tagungsband
- Categories
- International
- Tagungsbände