Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
International
Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Page - 159 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 159 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Image of the Page - 159 -

Image of the Page - 159 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Text of the Page - 159 -

(a) (b) (c) (d) Fig. 2. Iteratively pose refinement from an initial sensor estimate: (a) Test image with overlaid ground truth pose, (b) initial noisy sensor pose, (c) segmented image, (d) finally pose obtained with our method. IV. 3D LOCALIZATION Building on the same segmentation approach trained using the training data as described in Secs. II and III, we proposed two different approaches for pose estimation. A. Direct Pose Selection [1] Given a coarse initial estimate p˜ of the pose provided by the sensors and a 2.5D map of its surrounding, the goal is to finally estimate the correct pose pˆ. Therefore, we sample poses in a regular grid around p˜ and estimate pˆ=argmax p L(p), (1) whereL(p) is the log-likelihood L(p)=∑ x logPc(p,x)(x). (2) The sum runs over all image locations x, where c(p,x) is the class at location x when rendering the model under pose p, and Pc(x) is the probability for class c at location x where Pc is one of the probability maps predicted by the semantic segmentation. B. CNN-based Refinement [2] As this brute-force strategy is not very efficient, we addi- tionally proposed a CNN-based approach for iterative pose refinement.Torefine the location,wediscretize thedirections along the ground plane into 8 possible directions and train a network to predict the best direction to refine the currently estimated location. We also add a class that indicates that the estimated location is already correct and should not be changed. Thus, given the semantic segmentation of the current input image and a rendering of the 2.5D map from the current pose estimate, the network, denoted by CNNt, yields a 9-dimensional output vector: dt =CNNt(RF,RHE,RVE,RBG,SF,SHE,SVE,SBG), (3) Here, SF, SHE, SVE, and SBG denote the probability maps computed by the semantic segmentation for the classes fac¸ade, horizontal edge, vertical edge and background, re- spectively; RF, RHE, RVE, RBG are binary maps for the same classes, created by rendering the 2.5D map for the current pose estimate. In addition, we train a second network to refine the orientations: do=CNNo(RF,RHE,RVE,RBG,SF,SHE,SVE,SBG), (4) wheredo is a3-dimensionalvector, covering theprobabilities to rotate the camera to the right, to the left or not rotate it at all. Starting from the initial estimate p˜, we iteratively apply CNNt and CNNo and update the current pose. These steps are iterated until both networks are converged and predict not to move. In particular, there are two main advantages of having two networks: (a) As the networks for translation and orientation are treated separately, we do not need to balance between them. (b) The two detached problems are much easier to solve, reducing both, the training and the inference effort. V. RESULTS AND SUMMARY Two illustrative results obtained by the approach described inSec.IV-BareshowninFig.2. It clearlycanbeseen that the initial sensor poses (Fig. 2(c)) does not cover the groundtruth (Fig. 2(a)) very well, whereas the finally estimated poses (Fig.2(c))using thesegmentation results (Fig.2(b))perfectly fit the buildings. Overall, this demonstrates that adopting ideas from semantic segmentation in combination with con- volutional neural networks and the information provided by 2.5D maps can successfully be used for estimating the poses of buildings and thus their exact location. For more details, we would like to refer to [1] and [2]. REFERENCES [1] A. Armagan, M. Hirzer, and V. Lepetit, “Semantic Segmentation for 3D Localization in Urban Environments,” in JURSE, 2017, best Paper Award. [2] A. Armagan, M. Hirzer, P. M. Roth, and V. Lepetit, “Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization,” in CVPR, 2017. [3] C. Arth, C. Pirchheim, J. Ventura, D. Schmalstieg, and V. Lepetit, “Instant Outdoor Localization and SLAM Initialization from 2.5D Maps,” in ISMAR, 2015, best Paper Award. [4] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” CoRR, 2015. [5] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in CVPR, 2015. [6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” CoRR, 2014. 159
back to the  book Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics"
Proceedings of the OAGM&ARW Joint Workshop Vision, Automation and Robotics
Title
Proceedings of the OAGM&ARW Joint Workshop
Subtitle
Vision, Automation and Robotics
Authors
Peter M. Roth
Markus Vincze
Wilfried Kubinger
Andreas Müller
Bernhard Blaschitz
Svorad Stolc
Publisher
Verlag der Technischen Universität Graz
Location
Wien
Date
2017
Language
English
License
CC BY 4.0
ISBN
978-3-85125-524-9
Size
21.0 x 29.7 cm
Pages
188
Keywords
Tagungsband
Categories
International
Tagungsbände

Table of contents

  1. Preface v
  2. Workshop Organization vi
  3. Program Committee OAGM vii
  4. Program Committee ARW viii
  5. Awards 2016 ix
  6. Index of Authors x
  7. Keynote Talks
  8. Austrian Robotics Workshop 4
  9. OAGM Workshop 86
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Proceedings of the OAGM&ARW Joint Workshop