Page - 43 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 43 -

Text of the Page - 43 -

eration hardware while still directly controlling the robotplatformtoavoid thedomainshift. Wedirectly trackthehumanhandusingawebcamandusethees- timated hand pose to control the end-effector of the robot. Thedemonstrationdataareusedtotrainaneu- ral network, based on the architecture of [27], to en- able imitation by the robot system. We extend this work to include different regularization techniques during training and data augmentation to manage changes in brightnessand imperfectdemonstrations. Our method is implemented for the KUKA LWR IV+ [3] robotic arm for the task of pushing objects. Experiments show that the robot is able to replicate the demonstrated task with as few as 100 recorded examples. In comparison to the baseline [27], our inclusion of regularization and data augmentation achievesahigher success rate. Insummary,wemakethefollowingcontributions: • A vision-based hand tracking system to teleop- eratearobotarmtoperformmanipulationtasks. • Trainingofaneuralnetworkwithourgenerated teleoperateddata that enables task imitation. • Evaluationof thegeneralizationof the imitation learning tounseenconfigurations. • Improvements over the baseline by including regularizationmethodsduring the training. The remainder of this paper is as follows. Sec- tion 2 reviews related work and Section 3 presents our approach. In Section 4 we present our experi- ments and results. Section5concludes thepaper. 2. RelatedWork A popular approach to program a robot to per- formmanipulation tasks is learning fromdemonstra- tion [21, 2]. This involves recording example ma- nipulation sequences and then to transfer the trajec- tories to the robotplatformtoperformthe task itself. Trajectories are typically recorded using kinesthetic teaching[22,19,1], teleoperation[27,10,20]orgen- erated in simulation [6, 18]. Given a set of demon- strations, thesemethodsfindanappropriatemapping in order to replicate the closest matching trajectory, often making adaptations due to the variation be- tween thecurrent anddemonstratedscenarios. Some approaches represent the demonstrations as a set of primitives by encoding the trajectories and then gen- erating robotmotions throughprobabilisticmethods, e.g., Gaussian mixtures [5], Gaussian processes [22] or dynamic movement primitives [19, 12]. This al- lows for a more efficient search for the most appro- priate trajectory to replicate. More recent works apply deep neural net- works to learn visuomotor policies that map in- put images to robot trajectories through behavioral cloning [27,20]. A network is trained on demon- strations to learn the image-to-action mapping such that a closed-loop controller commands the manip- ulator through sequences of states to complete the task. In this line of work, teleoperation is the pre- ferredmethodtokinesthetic teachingbecausethehu- mandoesnot contaminate the training images. Extensions have been made that generalize the models to multiple tasks, which allows few- or even one-shot learning of new tasks [6, 26]. These meth- odsapplymeta-learningtoefficientlyadapta learned model, trained on many prior tasks, to a new task that is to be imitated. James et al. [10] take a dif- ferent approach and use metric learning to create a task embedding. Imitating a new demonstration is achieved by training a control network to trans- late learned task embeddings into desired actions. Huang et al. [9] propose neural task graphs to learn the common structure of tasks and the conjugate re- lationshipbetweenobservedstatesandactions. Anotherdirectionofworkis tolearnbyusingonly videosofhumansperformingtasks,e.g., [13,16,24]. However, human demonstrations do not provide suf- ficient supervision for learning. Therefore, other approaches explicitly learn the relationship between human and robot demonstrations in order to directly imitatehuman tasks in theonline setting [26]. In thiswork,webuildontheapproachesfor learn- ing visuomotor policies through behavioral cloning. Inparticular,weadapt themethodologypresentedby Zhangetal. [27]byreplacing the teleoperationhard- ware with a vision-based system. Our work is com- plementary to it as well as to the extension that in- corporates human demonstrations [26] by using our teleoperationsystemasanalternative. 3. Approach This section describes our approach for learning from demonstration, an overview is given in Fig- ure 2. For teleoperation, a webcam is used to track thehand(Section3.1) togeneratepositions thatcon- trol the robot’s end-effector (Section 3.2). During the trajectory, the RGB-D images from a ceiling 43

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik