Web-Books
im Austria-Forum
Austria-Forum
Web-Books
International
Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Seite - 159 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 159 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Bild der Seite - 159 -

Bild der Seite - 159 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Text der Seite - 159 -

(a) (b) (c) (d) Fig. 2. Iteratively pose refinement from an initial sensor estimate: (a) Test image with overlaid ground truth pose, (b) initial noisy sensor pose, (c) segmented image, (d) finally pose obtained with our method. IV. 3D LOCALIZATION Building on the same segmentation approach trained using the training data as described in Secs. II and III, we proposed two different approaches for pose estimation. A. Direct Pose Selection [1] Given a coarse initial estimate p˜ of the pose provided by the sensors and a 2.5D map of its surrounding, the goal is to finally estimate the correct pose pˆ. Therefore, we sample poses in a regular grid around p˜ and estimate pˆ=argmax p L(p), (1) whereL(p) is the log-likelihood L(p)=∑ x logPc(p,x)(x). (2) The sum runs over all image locations x, where c(p,x) is the class at location x when rendering the model under pose p, and Pc(x) is the probability for class c at location x where Pc is one of the probability maps predicted by the semantic segmentation. B. CNN-based Refinement [2] As this brute-force strategy is not very efficient, we addi- tionally proposed a CNN-based approach for iterative pose refinement.Torefine the location,wediscretize thedirections along the ground plane into 8 possible directions and train a network to predict the best direction to refine the currently estimated location. We also add a class that indicates that the estimated location is already correct and should not be changed. Thus, given the semantic segmentation of the current input image and a rendering of the 2.5D map from the current pose estimate, the network, denoted by CNNt, yields a 9-dimensional output vector: dt =CNNt(RF,RHE,RVE,RBG,SF,SHE,SVE,SBG), (3) Here, SF, SHE, SVE, and SBG denote the probability maps computed by the semantic segmentation for the classes fac¸ade, horizontal edge, vertical edge and background, re- spectively; RF, RHE, RVE, RBG are binary maps for the same classes, created by rendering the 2.5D map for the current pose estimate. In addition, we train a second network to refine the orientations: do=CNNo(RF,RHE,RVE,RBG,SF,SHE,SVE,SBG), (4) wheredo is a3-dimensionalvector, covering theprobabilities to rotate the camera to the right, to the left or not rotate it at all. Starting from the initial estimate p˜, we iteratively apply CNNt and CNNo and update the current pose. These steps are iterated until both networks are converged and predict not to move. In particular, there are two main advantages of having two networks: (a) As the networks for translation and orientation are treated separately, we do not need to balance between them. (b) The two detached problems are much easier to solve, reducing both, the training and the inference effort. V. RESULTS AND SUMMARY Two illustrative results obtained by the approach described inSec.IV-BareshowninFig.2. It clearlycanbeseen that the initial sensor poses (Fig. 2(c)) does not cover the groundtruth (Fig. 2(a)) very well, whereas the finally estimated poses (Fig.2(c))using thesegmentation results (Fig.2(b))perfectly fit the buildings. Overall, this demonstrates that adopting ideas from semantic segmentation in combination with con- volutional neural networks and the information provided by 2.5D maps can successfully be used for estimating the poses of buildings and thus their exact location. For more details, we would like to refer to [1] and [2]. REFERENCES [1] A. Armagan, M. Hirzer, and V. Lepetit, “Semantic Segmentation for 3D Localization in Urban Environments,” in JURSE, 2017, best Paper Award. [2] A. Armagan, M. Hirzer, P. M. Roth, and V. Lepetit, “Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization,” in CVPR, 2017. [3] C. Arth, C. Pirchheim, J. Ventura, D. Schmalstieg, and V. Lepetit, “Instant Outdoor Localization and SLAM Initialization from 2.5D Maps,” in ISMAR, 2015, best Paper Award. [4] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” CoRR, 2015. [5] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in CVPR, 2015. [6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” CoRR, 2014. 159
zurück zum  Buch Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics"
Proceedings of the OAGM&ARW Joint Workshop Vision, Automation and Robotics
Titel
Proceedings of the OAGM&ARW Joint Workshop
Untertitel
Vision, Automation and Robotics
Autoren
Peter M. Roth
Markus Vincze
Wilfried Kubinger
Andreas Müller
Bernhard Blaschitz
Svorad Stolc
Verlag
Verlag der Technischen Universität Graz
Ort
Wien
Datum
2017
Sprache
englisch
Lizenz
CC BY 4.0
ISBN
978-3-85125-524-9
Abmessungen
21.0 x 29.7 cm
Seiten
188
Schlagwörter
Tagungsband
Kategorien
International
Tagungsbände

Inhaltsverzeichnis

  1. Preface v
  2. Workshop Organization vi
  3. Program Committee OAGM vii
  4. Program Committee ARW viii
  5. Awards 2016 ix
  6. Index of Authors x
  7. Keynote Talks
  8. Austrian Robotics Workshop 4
  9. OAGM Workshop 86
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Proceedings of the OAGM&ARW Joint Workshop