Seite - 7 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 7 -

Text der Seite - 7 -

mation [30]. Open source implementations for dif- ferent learning tasks are plentiful and can be used to provide perception for a robotics system. Due to thestrongcapabilitiesofCNNsasgeneralfeatureex- tractors, it is possible to learn multiple visual targets which can be different depending on the environ- mentorapplication. Thisrelaxestheconstraintofus- ingspecificallydesignedvisualmarkersthatclassical CVmethodspose. Thelearningtaskof theobjectde- tector in thiswork iscomparativelysimple (onlyone class of logos exist and they are easily distinguish- able fromthe restof the target, seeFig.1). In previous work the LiDAR of the mobile robot, a robotino, was used to create a map of the envi- ronment and localization was implemented with the amcl package. While this pipeline in combination with obstacle avoidance methods has been useful for path-planning across the room, only employing the AMCL the robot arrives at the target position with great inaccuracy (10cmto20cm). Therefore, for this project an entirely vision-based solution for docking was developed which is bound to take over the task of generating pose-estimates from the AMCL once the robot comesclose to thedocking target. The aim of this work therefore is to approach and dockontodesiredtargetsinasemiindustrialenviron- ment with sufficiently high accuracy. To contribute to the transitionofstate-of-the-artCNNsfrompublic datasets to real world problems an appropriate com- bination of old and new algorithms is presented in this work. A CNN based object detectors is used for image processing and object detection, followed by a camera pose estimation algorithm using point correspondences from the detections. The presented method could be easily adapted to learn new target positionsoutfittedwithavisualmarkerwithminimal setup requirements. 2.STATEOFTHEART Theproblemofestimating theposeofacalibrated camera,assumingaknown3Dscene, isknownasthe PnP-problem[29]. The idea is tousea featuredetec- tor suchasSIFT[16]orSURF[2] toextract features from multiple sequential images. Since an image of aknown3Dpointgives twononlinearconstraintson camera pose and calibration, using three points (or moreprecisely three image-objectpointpairs)would give all 6 pose parameters. As [33] point out, such minimal cases lead to polynomial systems with mul- tiple solutions, hence one additional point is used. This leads to fournecessarypoints forestimating the pose (and one intrinsic parameter) and six points for estimation of 3D pose and five additional calibration parameters. The problem is formulated diffently for theplanar two-dimensionalor thegeneral,aforemen- tioned three dimensional case. Direct Linear Trans- formation(DLT, [9])allows theestimationof theho- mography matrixH for the planar problem, requir- ing at least four 2D-3D point correspondences. For the general case, DLT estimates the projection ma- trixP andrequiresat leastsixsuchcorrespondences. In either case,H orP can be expressed with a set A~x=0of multiple pairs of independent equations. Since individual pixels are generally noisy, no exact solution can be obtained using DLT, only an approx- imate solution by obtaining the SVD ofA. It should benoted, that for thenoisyandoverconstrainedcase, only the eigenvector ofATA, corresponding to the smallest eigenvalue, should be computed. A contin- uation to DLT is the family of PnP algorithms. Effi- cient PnP or EPnP [14] uses the notion that each of then 3D-2D point pairs are expressed as weighted sum of four virtual control points, and solves the poseproblemfromthesecontrolpoints. Perspective- Three-Point or P3P is a method applicable if only three correspondences are obtained, and in turn re- turns four real, possible solutions, the newest imple- mentation being Lambda Twist P3P [25]. A fourth point pair can be used to remove this four-solution ambiguity. Kartoun et al. [12] were able to achieve docking timesaveraging85secondsbutattributedthesuccess of their method to the unique hardware on the robot and a generously large docking station. Burschka et al. [3] take the aforementioned approach to the out- doors, using a Kanade-Lucas tracker [18] to track points in image sequences, followed by RANSAC and DLT. They achieve good results for rotation, but struggle with estimating translation. In the work of Mehralian et al. [21] an Extended Kalman Filter (EKF, [11]) is combined with PnP algorithms to cre- ate EKFPnP. They achieve better robustness against noisy features, although no details are given regard- ing the feature tracker. In the field of deep learning, pose estimation is awell researchedproblem[23], cameraposeestima- tionislessso[13]andnoarchitecturesordatasetsex- ists specificallydesignedfordockingamobile robot. Thedatasetwouldneed to include thecompletepose of the robot for every captured image to allow end- 7

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik