Page - 46 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 46 -
Text of the Page - 46 -
Figure3. Examplesof the learnedpolicy. First rowshowssuccessful trials. Bottomrowshows failures.
Table1.Ablation study
StartedPush Success
VanillaPolicy 50.0% 27.8%
NoDropout 66.7% 27.8%
NoBrightnessAug. 75.0% 41.7%
OurPolicy 86.1% 58.3%
implemented theoriginalmodelandadapted it toour
robotic platform and task. To test the effect of indi-
vidual changes, we applied our policy once without
dropout and once without data augmentation. The
vanilla policy and our policy without dropout only
achieve a success rate of 27.8%, which were almost
exclusively the trials when the box was located in a
middlepositionandonly requiredastraightpush.
The purpose of applying dropout to the end-
effectorpose inputof thenetwork is toputmoreem-
phasizeontheinput images. Withtheaddeddropout,
the success rate rises to 41.7%. Brightness augmen-
tation alone did not improve the overall success rate
over the vanilla policy. However, the combination
of dropout and brightness augmentation achieved a
success rate of 58.3%. We introduced the data aug-
mentation due to changing lighting conditions in the
test environment during the demonstrations. For the
evaluation wekept the lightingconditions the same.
Qualitative results are presented in Figure 3. The
first row shows sequences of successful trials in
which thebox ispushed to thegoal. Thesecond row
shows examples of failures. In one case, the robot
end-effector slides past the box and the policy loses
the target. In the second case, the box is pushed to a
location that isnot the goal.
5. Conclusion
This paper presented an approach for learning
fromdemonstrationusingavision-basedsolutionfor
robot teleoperation. A hand tracking method was
employed to generate commands that control the robot’send-effectoras thehumanoperatorcompletes
a manipulation task. The set of demonstrations were
used to train a deep imitation learning network that
learnsapolicy, enabling the robot to imitate the task.
Experimentsshowedthat the introductionof regular-
ization and data augmentation increased the success
rateover thebaselinemethod.
For future work, we plan to combine the LfD ap-
proachwithreinforcementlearninginsimulation. By
starting from the learned policy in simulation, the
training time of reinforcement learning approaches
canbegreatly reduced. Additionally, combiningreal
data with synthetic data collected in simulation mit-
igates the problem of domain adaptation of pure re-
inforcement learning methods. Another avenue is to
usemorehigh-levelknowledgeof thescene(e.g. ob-
ject pose) to make the approach less susceptible to
environmentchanges.
Acknowledgments
This research is partially supported by the Vienna
Science and Technology Fund (WWTF), project
RALLI (ICT15-045), the Austrian Science Founda-
tion (FWF), project InDex (I3969-N30), and Festo
AG&Co. KG.
References
[1] B.Akgun,M.Cakmak,K.Jiang,andA.L.Thomaz.
Keyframe-based learning from demonstration. The
Int. JournalofSocialRobotics, 4(4):343–355,2012.
[2] B.D.Argall,S.Chernova,M.Veloso,andB.Brown-
ing. Asurveyof robot learning fromdemonstration.
Robotics and Autonomous Systems, 57(5):469–483,
2009.
[3] R. Bischoff, J. Kurth, G. Schreiber, R. Koeppe,
A. Albu-Scha¨ffer, A. Beyer, O. Eiberger, S. Had-
dadin,A.Stemmer,G.Grunwald,etal. TheKUKA-
DLR lightweight robot arm-a new reference plat-
form for robotics research and manufacturing. In
Proc. of Int. Symposium on Robotics and German
Conferenceon Robotics, 2010.
46
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik