Seite - 18 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Bild der Seite - 18 -
Text der Seite - 18 -
The comparison between two binary descriptors itself is
done via calculating the Hamming distance [14], which is the
number of different bits and a very efficient operation. Nor-
mally, the descriptor with the minimum Hamming distance
is chosen as the best match. To improve the robustness of
the matching, we additionally apply the distance-ratio-test as
proposed in [20]. It just accepts a match if the ratio between
the two closest neighbors is below a threshold rmax∈Rwith
0< rmax< 1. Using binary descriptors, the ratio rH ∈R
between two descriptors is defined as
rH= dH,1
dH,2 <rmax, (2)
where dH,1∈N and dH,2∈N are the Hamming distances of
the two closest neighbors, respectively. In our case, we use
an empirical threshold of rmax=0.71 which helps to remove
ambiguous matches that can occur at repeatable structures
like branches.
C. Motion Estimation and Key Frame Selection
In this step, the calculation of the relative camera motion,
i.e. the relative transformation TC,mC,m−1 between an image
pair {m−1,m}, takes place. Therefore, we use calibrated
stereo-cameras and two sets of corresponding featuresFm−1
andFm of the imagesm−1 andm, respectively.
For the 3D-to-2D algorithm, the features of Fm−1 are
defined by 3D points in Cm−1 and the one of Fm by 2D
image points [24]. Normally, we use 2D features of the left
image with coordinate system vm. Alternatively, if the motion
estimation fails due to less feature matches, features of the
right image with coordinate system v′m can also be used
to prevent a failure of the VO. The estimation of the 3D
points is done via the linear triangulation method of Hartley
and Zissermann [16], which is implemented in OpenCV
[5]. Using a function dE to calculate the Euclidean distance
[11], the transformation TC,mC,m−1 can be found through
minimizing the image reprojection error of all features
min
TC,mC,m−1 n
∑
i=1 dE (
v,mtv,mx,i, v,mtˆv,mx,i(TC,mC,m−1)
)2
. (3)
Thereby, v,mtv,mx,i is the 2D coordinate vector of the image
point xi and v,mtˆv,mx,i the image coordinate vector of the 3D
point Xi, which is observed in Cm−1 and projected through
TC,mC,m−1 and the corresponding camera projection matrix
[16] into imagem. Equation (3) can be solved using at least
three 3D-to-2D correspondences, is known as P3P (Perspec-
tive from three Points) and returns four solutions. Therefore,
at least one another point is necessary to get a single and
distinct solution. PnP-algorithms (Perspective fromnPoints)
like EPnP (Efficient PnP) [18] use n≥3 correspondences
to solve the problem. Normally, these methods just calculate
accurate results if the used correspondences are correct. If
this is not guaranteed, the well known procedure RANSAC
(Random Sample Consensus) [12] should be used to remove
wrong correspondences, so called outliers. In [13], such
a robust motion estimation using RANSAC is explained
more in detail. Our VO uses EPnP for the pose estimation and a preliminary non-minimal RANSAC with five points
to acquire trustworthy results of the outlier removal as
suggested by Fraundorfer et al. [13].
If the first motion estimation with the left image fails due
to less featurematches,or themotion is implausible (position
or orientation is unrealistic), then the estimation is retried
with 2D features of another image as a backup. The order
of these images is the following. Firstly, the right image
of the actual stereo frame is used. If the motion estimation
with the features of this image is also unsuccessful, then a
consecutive still unused left or right image is used until the
motion estimation step is successful. This procedure avoids
a failure of the VO with high probability.
The selection of key frames is another important compo-
nent of our VO. In general, the drift of a VO increases with
every frame, i.e. every relative motion, which is used for the
update of the absolute motion. Therefore, the concatenation
of smallmotionsshouldbeavoided tokeep thedrift as lowas
possible. This means that the transformation TC,mC,m−1 should
not be used to update the absolute transformation TC,mC,1 if
the motion between the image pair {m−1,m} is small or
even zero. Instead, we should stay with TC,m−1C,1.
We define a stereo framem as a key framem if its relative
transformation is used for the absolute motion update. Our
defined requirement is that the relative change in position is
bigger than 2m or the relative angle of rotation [9] is bigger
than 20â—¦.
D. Bundle Adjustment
Windowed bundle adjustment [28] is the last important
step in our feature-based VO system. It is used to optimize
the relative transformations of the most recentM key frames.
For simplicity, we assume n 3D-points i∈{1,...,n}, which
are seen in a window ofM≤m key frames j∈{m,...,m}.
Hereby, the index of the oldest stereo frame in the window
is defined as m=(m−M+1). To reduce the computation
demand, our VO just uses a window with the most recent
M=2 key frames, i.e. in total the features of four images
are used for the optimization.
Bundle Adjustment is, like in (3), again the minimization
of the image reprojection error and is given by
min
TC,jC,1,C,1tC,1X,i n
∑
i=1 m
∑
j=m dE (
v,jtv,jx,i, v,jtˆv,jx,i(TC,jC,1,C,1tC,1X,i)
)2
.
(4)
Thereby, v,jtv,jx,i and v,jtˆv,jx,i are, respectively, the vectors of
the observed and estimated 2D coordinates of point i in
key frame j. Due to the projection of the point Xi into the
image plane, the estimated coordinates are dependent on the
absolute transformations TC,jC,1, the 3D coordinate vector
C,1tC,1X,i and the corresponding camera projection matrices.
The camera parameters are assumed as constant and known
via a prior calibration. The minimization of (4) is done using
the sparse bundle adjustment library of Lourakis et al. [19].
18
Proceedings of the OAGM&ARW Joint Workshop
Vision, Automation and Robotics
- Titel
- Proceedings of the OAGM&ARW Joint Workshop
- Untertitel
- Vision, Automation and Robotics
- Autoren
- Peter M. Roth
- Markus Vincze
- Wilfried Kubinger
- Andreas Müller
- Bernhard Blaschitz
- Svorad Stolc
- Verlag
- Verlag der Technischen Universität Graz
- Ort
- Wien
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-524-9
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Schlagwörter
- Tagungsband
- Kategorien
- International
- Tagungsbände