Page - 28 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Image of the Page - 28 -
Text of the Page - 28 -
Asrepresentationsand learningschemeshavegrowncapableofaccommodating thesheervariability
inthedata, thisprogressisalsoimposingnewrequirementsontheemployeddatasets. Current learned
modelsareoftenoptimizedforspecificdatasetstheyhavebeentrainedon,andtheircapturemodalities
are restricted by their implicit design. Real world scenarios are highly diverse, therefore, a single
dataset solely represents a small fraction of all possible visual appearances. Although datasets have
become more elaborate and diverse lately [17], class coverage, balancing and variability are still
relevant issues tobe tackled. Motivatedbythediversity in thecharacteristicsofprevailingdatasets, in
termsofnumberandgranularityofannotatedclassesandscene-specificviewattributes,weproposeto
capture the spatial relationship between various semantically labeled regions across several datasets.
Wedemonstrate that themodeledspatialpriorcanenhancerecognitionaccuracies leading tostate-of-
the-art results, as illustrated inFigure1.
2. RelatedWork
Spatial context is an important type of information in the human cognitive process [12] when recog-
nizing objects, especially in the presence of a cluttered background. Certain objects predominantly
co-occur in the real world. Thus analyzing vast amounts of visual data can result in meaningful
contextual statistics which can beused to robustifyvisualobject recognition [5].
Pixel-wise semantic labeling is a relatively novel domain since large-scale object recognition with
shared informativerepresentations isaprerequisite for this task. Startingwithmanuallyselected low-
level features,discriminatively trainedRandomForestsorBoostinghavebeenusedtoperformclassi-
ficationpatch-wise [16]or toadditionally incorporate local structural informationwithin theanalysis
patch[7]. Basedonrecentadvances indeep learning, several frameworks [13,18]havedemonstrated
significant improvements in the accuracy of per-pixel class estimates. Recently, multi-scale deep ar-
chitectures have been proposed in order to represent local and global context by employing multiple
input images at different resolutions [2], or combining feature maps from different layers of the con-
volutionalarchitecture [6]. Both techniquesaimtocombinefinedetail representationswith relational
information established at a coarse resolution level in order to generate accurate segment bound-
aries between labeled regions. The immense representational power of deep convolutional architec-
turescaptures richdetailsof theobject classes tobe representedandyields segmentation frameworks
which surpass learned hand-crafted representations. Capturing spatial context within convolutional
architectures, however, is linked with complexities in terms of training (augmented parameter space)
and increasedcomputational expensedue to thecomputation ofmultiple scale-specific features.
Our proposed approach employs a previously learned spatial prior model as an additional step to
switch class labels at locations where per-pixel estimates are ambiguous. We term our model as
the Explicit Priors model. Per-pixel ambiguity is quantified from class posterior probabilities at the
givenpixelbyexamining thedistancebetweenfirstandsecondrankprobabilities. Ourmethod,while
limited in representing spatial context at a wide range of spatial scales and orientations, yields a
remarkable improvementat a negligible increaseof computational complexity.
3. Methodology andExperimentalSetup
The proposed approach for combining learned information from multiple datasets and thereby en-
hancing existing classifiers is based on the concept of Explicit Priors. By aggregating statistical data
onthe levelof individualpixelsandcapturingspatialcontext,wegenerateadditionalcuesfor training
28
Proceedings
OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Title
- Proceedings
- Subtitle
- OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Authors
- Peter M. Roth
- Kurt Niel
- Publisher
- Verlag der Technischen Universität Graz
- Location
- Wels
- Date
- 2017
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-527-0
- Size
- 21.0 x 29.7 cm
- Pages
- 248
- Keywords
- Tagungsband
- Categories
- International
- Tagungsbände