Page - 100 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Image of the Page - 100 -

Text of the Page - 100 -

voxels. A kd-tree would allow for exact nearest neighbor queries, but would have 𝑂 (𝑚 ∗ 𝑙 𝑜 𝑔 (𝑛 )) complexity. The speedup achieved by the 𝑂 (𝑚 ) verification allows us to evaluate more candidate transformations at the same time to boost accuracy. Filtering Candidate Solutions. After the random sampling and matching process is over, the candidate solutions are filtered, since it is likely that multiple similar solutions have been found. In [3] a pose clustering approach is used. The pose clustering combines multiple similar poses to find an average position from these candidate solutions. This approach falls short for symmetric objects. E.g. a sphere where the reference frame is off center will result in many different potential poses, but each pose will have a completely different translation and rotation. In RANGO, we have replaced the clustering approach with a filtering approach. All candidate solutions are sorted by the number of inliers, highest number first. Iteratively, each solution is re-checked if it meets a given inlier threshold, and if it does, all scene voxels that were used in this inlier check are removed from the 3D voxel grid. This way only the best fitting candidate solution for a potential pose is used, while it is still possible to find multiple instances of the same object in the scene data. This approach works well for both complex and symmetrical objects. 3.2. Multi-Forest Tracking Our multi-forest tracking approach is a variation of the multi-forest tracking algorithm described in [18]. Our modification to this algorithms retains the performance characteristic of [18] while having a significantly lower memory and training overhead, which allows the use of this algorithm on devices with limited computational power such as a tablet pc. It is noteworthy that we only use depth data for both tracking and object localization. The reason is that our main goal is to be able to robustly track industrial parts, and these usually do not carry much color information. Single-view-Tracking. The multi-forest tracker described in [18] uses 6 ∗ 𝑛 𝑐 ∗ 𝑛 𝑡 random forests, where the number of dimensions to represent a pose is 6, 𝑛 𝑐 is the number of camera positions and 𝑛 𝑡 the number of trees in each forest. For each camera position sample points of the objects are extracted and used to train 6 random regression forests for tracking of this camera view. An algorithm switches between the camera views that are currently best visible. They parameterize this as 𝑛 𝑐 = 42 and 𝑛 𝑡 = 100 resulting in 25200 random trees. Each tree is generated from a test set of 50000 samples, resulting in a significant training effort. In our approach we have reduced this effort significantly to only 6 ∗ 𝑛 𝑡 trees. After the samples have been generated, each random forest is trained with all samples for a single dimension of the pose vector. That is, random forests 1 to 3 are trained for changes in translation (x, y, and z), and random forest 4 to 6 are trained on the changes in rotation (roll, pitch and yaw) parameters respectively. During tracking, the depth changes are used to predict the changes in pose vector by simply combining the predictions of each random forest. In practice, we set 𝑛 𝑡 = 70, resulting in only 420 random trees which in turn leads to a 60 times faster training time and 60 times less memory requirements. It typically takes about 3 minutes on an Intel(R) Core(TM) i5-3570 CPU (the proposed approach is implemented and tested on such a workstation) to train a new object for tracking. This low memory requirement also allows the tracking to run in real time. We initially sample a set of approximately 400 points from the surface of the object. Sampling is done by raytracing points onto the objects surface, creating a 3D grid around that object and then sampling a single point per grid. This sampling approach leads to evenly spaced sample points all over the visible surface of the object. The depth distance of these sample points to the visible depth map is then used in the training data. Since we have sampled points from all around the objects, many points will not lie on the visible surface but will be behind it. We rely on the random 100

back to the book Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“"

Proceedings OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Title: Proceedings
Subtitle: OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Authors: Peter M. Roth; Kurt Niel
Publisher: Verlag der Technischen Universität Graz
Location: Wels
Date: 2017
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-527-0
Size: 21.0 x 29.7 cm
Pages: 248
Keywords: Tagungsband
Categories: International; Tagungsbände

Page - 100 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Image of the Page - 100 -

Text of the Page - 100 -

Table of contents