Page - 46 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 46 -

Text of the Page - 46 -

Figure3. Examplesof the learnedpolicy. First rowshowssuccessful trials. Bottomrowshows failures. Table1.Ablation study StartedPush Success VanillaPolicy 50.0% 27.8% NoDropout 66.7% 27.8% NoBrightnessAug. 75.0% 41.7% OurPolicy 86.1% 58.3% implemented theoriginalmodelandadapted it toour robotic platform and task. To test the effect of indi- vidual changes, we applied our policy once without dropout and once without data augmentation. The vanilla policy and our policy without dropout only achieve a success rate of 27.8%, which were almost exclusively the trials when the box was located in a middlepositionandonly requiredastraightpush. The purpose of applying dropout to the end- effectorpose inputof thenetwork is toputmoreem- phasizeontheinput images. Withtheaddeddropout, the success rate rises to 41.7%. Brightness augmen- tation alone did not improve the overall success rate over the vanilla policy. However, the combination of dropout and brightness augmentation achieved a success rate of 58.3%. We introduced the data aug- mentation due to changing lighting conditions in the test environment during the demonstrations. For the evaluation wekept the lightingconditions the same. Qualitative results are presented in Figure 3. The first row shows sequences of successful trials in which thebox ispushed to thegoal. Thesecond row shows examples of failures. In one case, the robot end-effector slides past the box and the policy loses the target. In the second case, the box is pushed to a location that isnot the goal. 5. Conclusion This paper presented an approach for learning fromdemonstrationusingavision-basedsolutionfor robot teleoperation. A hand tracking method was employed to generate commands that control the robot’send-effectoras thehumanoperatorcompletes a manipulation task. The set of demonstrations were used to train a deep imitation learning network that learnsapolicy, enabling the robot to imitate the task. Experimentsshowedthat the introductionof regular- ization and data augmentation increased the success rateover thebaselinemethod. For future work, we plan to combine the LfD ap- proachwithreinforcementlearninginsimulation. By starting from the learned policy in simulation, the training time of reinforcement learning approaches canbegreatly reduced. Additionally, combiningreal data with synthetic data collected in simulation mit- igates the problem of domain adaptation of pure re- inforcement learning methods. Another avenue is to usemorehigh-levelknowledgeof thescene(e.g. ob- ject pose) to make the approach less susceptible to environmentchanges. Acknowledgments This research is partially supported by the Vienna Science and Technology Fund (WWTF), project RALLI (ICT15-045), the Austrian Science Founda- tion (FWF), project InDex (I3969-N30), and Festo AG&Co. KG. References [1] B.Akgun,M.Cakmak,K.Jiang,andA.L.Thomaz. Keyframe-based learning from demonstration. The Int. JournalofSocialRobotics, 4(4):343–355,2012. [2] B.D.Argall,S.Chernova,M.Veloso,andB.Brown- ing. Asurveyof robot learning fromdemonstration. Robotics and Autonomous Systems, 57(5):469–483, 2009. [3] R. Bischoff, J. Kurth, G. Schreiber, R. Koeppe, A. Albu-Scha¨ffer, A. Beyer, O. Eiberger, S. Had- dadin,A.Stemmer,G.Grunwald,etal. TheKUKA- DLR lightweight robot arm-a new reference plat- form for robotics research and manufacturing. In Proc. of Int. Symposium on Robotics and German Conferenceon Robotics, 2010. 46

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"