Page - 33 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Image of the Page - 33 -
Text of the Page - 33 -
Ground Vehicle Pedestrian Sky Building Avg
BL 97.8 |95.0 72.9 |67.9 82.0 |97.1 89.6 |84.8
BL |LLN 97.2 |95.5 98.5 |86.7 97.1 |77.2 72.9 |67.9 83.5 |96.9 89.8 |84.8
BL |LLN |CRF 97.7 |96.9 89.1 |81.5 86.3 |98.4 93.7 |88.1
Table3. Precision (left) andrecall (right)of each label class.
regions. Concerning the influence of LLN, the Building class reaches an increase in precision of
1.5% combined with an insignificant decrease of recall. Simultaneously, the optimization leads to a
decrease in precision for the Ground class, while increasing its recall. It can be concluded that the
methodsuccessfullyrecoversmisclassifiedGroundpixelsoriginally labeledasBuilding. CRFfurther
increases theaverageprecisionand recall by an additional3.9%and 3.3%, respectively.
5. Conclusions
Thispaper introducesaconcept tocapturespatialcontextbetweenlabeledregionsfordiversedatasets
annotated at different semantic granularity, referred to as Explicit Priors, which was successfully ap-
plied toenhance theentire trainingandclassificationprocessof semantic segmentationdemonstrated
on the Daimler Urban Segmentation 2014 dataset. The approach provides a generalized way to se-
lect an appropriate subset of multiple training datasets and to efficiently combine their labels to fit a
given application scenario. The segmentation quality of foreground classes is comparable to, and in
terms of certain measures even surpasses, state-of-the-art methods. The results for the background
classes proved to be competitive as well. Their relatively high precision, combined with lower recall
correspond toaclassificationaccuracyofcertain labels slightly inferior tocurrently leadingmethods.
Further improvements concerning background labeling were achieved by applying priors based on
Local Label Neighborhood as well as inference using CRF. In order to exploit additional potentials,
thenext stepwould be to integratecomplimentary modalities, suchasdepthandmotioncues.
Acknowledgments
This work is supported by the research initiative ’Mobile Vision’ with funding from the Austrian
FederalMinistryofScience, ResearchandEconomy and theAustrian InstituteofTechnology.
References
[1] Gabriel J. Brostow, Julien Fauqueur, and Roberto Cipolla. Semantic object classes in video: A
high-definitionground truthdatabase. PatternRecognitionLetters, 30(2):88–97,2009.
[2] David Eigen and Rob Fergus. Predicting depth, surface normals and semantic labels with a
commonmulti-scaleconvolutional architecture. InProceedingsof the IEEEInternationalCon-
ferenceonComputer Vision, pages 2650–2658,2015.
[3] Mark Everingham, S.M. Ali Eslami, Luc Van Gool, Christopher K.I. Williams, John Winn, and
AndrewZisserman. ThePascalVisualObjectClassesChallenge: Aretrospective. International
JournalofComputerVision, 111(1):98–136,2015.
[4] Stephen Gould. Darwin: A framework for machine learning and computer vision research and
development. TheJournal ofMachineLearningResearch, 13(1):3533–3537,2012.
33
Proceedings
OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Title
- Proceedings
- Subtitle
- OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Authors
- Peter M. Roth
- Kurt Niel
- Publisher
- Verlag der Technischen Universität Graz
- Location
- Wels
- Date
- 2017
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-527-0
- Size
- 21.0 x 29.7 cm
- Pages
- 248
- Keywords
- Tagungsband
- Categories
- International
- Tagungsbände