Page - 28 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Image of the Page - 28 -

Text of the Page - 28 -

Asrepresentationsand learningschemeshavegrowncapableofaccommodating thesheervariability inthedata, thisprogressisalsoimposingnewrequirementsontheemployeddatasets. Current learned modelsareoftenoptimizedforspecificdatasetstheyhavebeentrainedon,andtheircapturemodalities are restricted by their implicit design. Real world scenarios are highly diverse, therefore, a single dataset solely represents a small fraction of all possible visual appearances. Although datasets have become more elaborate and diverse lately [17], class coverage, balancing and variability are still relevant issues tobe tackled. Motivatedbythediversity in thecharacteristicsofprevailingdatasets, in termsofnumberandgranularityofannotatedclassesandscene-specificviewattributes,weproposeto capture the spatial relationship between various semantically labeled regions across several datasets. Wedemonstrate that themodeledspatialpriorcanenhancerecognitionaccuracies leading tostate-of- the-art results, as illustrated inFigure1. 2. RelatedWork Spatial context is an important type of information in the human cognitive process [12] when recog- nizing objects, especially in the presence of a cluttered background. Certain objects predominantly co-occur in the real world. Thus analyzing vast amounts of visual data can result in meaningful contextual statistics which can beused to robustifyvisualobject recognition [5]. Pixel-wise semantic labeling is a relatively novel domain since large-scale object recognition with shared informativerepresentations isaprerequisite for this task. Startingwithmanuallyselected low- level features,discriminatively trainedRandomForestsorBoostinghavebeenusedtoperformclassi- ficationpatch-wise [16]or toadditionally incorporate local structural informationwithin theanalysis patch[7]. Basedonrecentadvances indeep learning, several frameworks [13,18]havedemonstrated significant improvements in the accuracy of per-pixel class estimates. Recently, multi-scale deep ar- chitectures have been proposed in order to represent local and global context by employing multiple input images at different resolutions [2], or combining feature maps from different layers of the con- volutionalarchitecture [6]. Both techniquesaimtocombinefinedetail representationswith relational information established at a coarse resolution level in order to generate accurate segment bound- aries between labeled regions. The immense representational power of deep convolutional architec- turescaptures richdetailsof theobject classes tobe representedandyields segmentation frameworks which surpass learned hand-crafted representations. Capturing spatial context within convolutional architectures, however, is linked with complexities in terms of training (augmented parameter space) and increasedcomputational expensedue to thecomputation ofmultiple scale-specific features. Our proposed approach employs a previously learned spatial prior model as an additional step to switch class labels at locations where per-pixel estimates are ambiguous. We term our model as the Explicit Priors model. Per-pixel ambiguity is quantified from class posterior probabilities at the givenpixelbyexamining thedistancebetweenfirstandsecondrankprobabilities. Ourmethod,while limited in representing spatial context at a wide range of spatial scales and orientations, yields a remarkable improvementat a negligible increaseof computational complexity. 3. Methodology andExperimentalSetup The proposed approach for combining learned information from multiple datasets and thereby en- hancing existing classifiers is based on the concept of Explicit Priors. By aggregating statistical data onthe levelof individualpixelsandcapturingspatialcontext,wegenerateadditionalcuesfor training 28

back to the book Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“"

Proceedings OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Title: Proceedings
Subtitle: OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Authors: Peter M. Roth; Kurt Niel
Publisher: Verlag der Technischen Universität Graz
Location: Wels
Date: 2017
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-527-0
Size: 21.0 x 29.7 cm
Pages: 248
Keywords: Tagungsband
Categories: International; Tagungsbände

Page - 28 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Image of the Page - 28 -

Text of the Page - 28 -

Table of contents