Page - 17 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Image of the Page - 17 -
Text of the Page - 17 -
TC,2C,1
TC,m−1C,1
TC,mC,1
TC,m−1C,2
TC,mC,m−1
C2
Cm−1 Cm
C1
v2
vm−1 vm
v′2
v′m−1 v′m
Fig. 1: Illustrated VO problem of a stereo system
(relative transformations TC,m−1C,2, TC,mC,m−1 /
absolute transformations TC,2C,1, TC,m−1C,1, TC,mC,1)
Furthermore, the usage of additional sensors like GPS,
laser scanner or IMU can improve the pose estimation. For
example, in [1], [23], [17] or in [27] the integration of an
IMU reduces the error in orientation. In [17] Konolige et al.
achieve with their implemented real-time VO a maximum
relative position error of just 0.1% over a 9km long track.
Another good result is shown by Tardif et al. [27] over a
5.6km long track. This dataset was acquired by a tractor
driving next to an orange grove and on a street for the return
to the garage.
III. VISUAL ODOMETRY
As discussed in Section II, Visual Odometry incrementally
estimates the pose. Figure 1 shows this for a typical case
using a stereo-camera system. The calculation of a relative
homogeneous transformation TC,mC,m−1∈SE(3) of an image
pair {m−1,m} with camera centers / camera coordinate
systems Cm−1 and Cm is done via features in the images. As
shown in the figure, the coordinate system of the left camera
is the reference point of a transformation TC,mC,m−1, which
transforms from Cm−1 to Cm. The rigid body transformation
is given by
TC,mC,m−1= [
RC,mC,m−1 C,mtC,mC,m−1
0 1 ]
(1)
where RC,mC,m−1∈SO(3) is the orthogonal rotation matrix
and C,mtC,mC,m−1∈R3 the translation vector, represented in
the coordinate system Cm. The concatenation of all rela-
tive transformations results in the absolute transformation
TC,mC,1=TC,mC,m−1TC,m−1C,1 from C1 to Cm.
Therefore, themain taskofaVOis tocalculate the relative
transformations TC,mC,m−1 and finally to concatenate them to
get the full camera trajectory TC,m:C,1={TC,2C,1,...,TC,mC,1}
between the camera centers C1 and Cm.
The structure of our VO approach is similar to the one of
Nister et al. [22] and it starts with the feature detection and
descriptionbut it uses themoredistinct featuresA-KAZE [2]
instead of Harris [15]. The next step is to match features
between a stereo pair and one consecutive image, either left
or right. Then, the triangulated stereo correspondences and
the matched 2D features are used for the pose estimation. At the end, key frames are selected and windowed bundle
adjustment [28] is applied to further optimize the previous
calculated poses [27].
A. Feature Detection andDescription
Feature detection is one of the most important steps
in a feature-based Visual Odometry system. Regarding to
Fraundorfer [13], important properties of features are detec-
tion repeatability, localization accuracy, robustness against
noise as well as computation efficiency. In [8], Cordes et
al. compare many different detection algorithms and the
detector A-KAZE [2] proofs to be the best candidate in terms
of localization accuracy and suitable number of detected
features. This detector is implemented in OpenCV [5] and
is an extension of the algorithm KAZE [3] to detect blobs.
In general, these features are image patterns with different
intensity, color and texture compared to its adjacent pixels
and they are more distinctive than corners [13]. This is
especially important in natural environment with ambiguous
structures like branches or leaves. In our case, A-KAZE
detects blobs in a nonlinear scale space with four octaves
and the same amount of sub-levels.
In addition to the detection algorithm, A-KAZE also
provides one for the description of a feature, which is
implemented in OpenCV as well. It converts the area around
a feature into a binary descriptor which has a length of
486bit. Every comparison between two areas results in
three bit. The description algorithm of A-KAZE is called
M-LDB (Modified-Local Difference Binary) and is rotation
and scale invariant.According to Alcantarilla et al., A-KAZE
allows efficient and successful feature matching, which are
mandatory properties of a good descriptor.
B. FeatureMatching
The task of this step is to find feature correspondences
among images. The easiest way to achieve matching between
two images is to compare all feature descriptors of the first
image with every other descriptor of the second one. This
search is quadratic in the number of features. Fortunately,
the usage of epipolar or motion constraints simplifies this
task and reduces the computation time drastically. This is
necessary to facilitate an online VO system, which could be
used on a vehicle like a tractor during its operation in a field
or forest.
Our stereo VO relies on rectified images, which are
remapped image pairs with horizontal and aligned epipolar
lines to each other (see [13]). Thus, epipolar matching just
allows a match between features which lie on the same
horizontal epipolar line or rather image row.
Descriptors of two consecutive left or right images can
be matched via a motion constraint. As proposed in [10],
we assume a constant velocity model between two frames.
Using the known motion, we can project the 3D point of a
already matched stereo correspondence into the other image.
Aconstantwindowof2·35×2·35pixelaround theprojected
position defines the allowed area of possible features and
therefore reduces the computing time.
17
Proceedings of the OAGM&ARW Joint Workshop
Vision, Automation and Robotics
- Title
- Proceedings of the OAGM&ARW Joint Workshop
- Subtitle
- Vision, Automation and Robotics
- Authors
- Peter M. Roth
- Markus Vincze
- Wilfried Kubinger
- Andreas Müller
- Bernhard Blaschitz
- Svorad Stolc
- Publisher
- Verlag der Technischen Universität Graz
- Location
- Wien
- Date
- 2017
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-524-9
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Keywords
- Tagungsband
- Categories
- International
- Tagungsbände