Page - 43 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 43 -
Text of the Page - 43 -
eration hardware while still directly controlling the
robotplatformtoavoid thedomainshift. Wedirectly
trackthehumanhandusingawebcamandusethees-
timated hand pose to control the end-effector of the
robot. Thedemonstrationdataareusedtotrainaneu-
ral network, based on the architecture of [27], to en-
able imitation by the robot system. We extend this
work to include different regularization techniques
during training and data augmentation to manage
changes in brightnessand imperfectdemonstrations.
Our method is implemented for the KUKA LWR
IV+ [3] robotic arm for the task of pushing objects.
Experiments show that the robot is able to replicate
the demonstrated task with as few as 100 recorded
examples. In comparison to the baseline [27], our
inclusion of regularization and data augmentation
achievesahigher success rate.
Insummary,wemakethefollowingcontributions:
• A vision-based hand tracking system to teleop-
eratearobotarmtoperformmanipulationtasks.
• Trainingofaneuralnetworkwithourgenerated
teleoperateddata that enables task imitation.
• Evaluationof thegeneralizationof the imitation
learning tounseenconfigurations.
• Improvements over the baseline by including
regularizationmethodsduring the training.
The remainder of this paper is as follows. Sec-
tion 2 reviews related work and Section 3 presents
our approach. In Section 4 we present our experi-
ments and results. Section5concludes thepaper.
2. RelatedWork
A popular approach to program a robot to per-
formmanipulation tasks is learning fromdemonstra-
tion [21, 2]. This involves recording example ma-
nipulation sequences and then to transfer the trajec-
tories to the robotplatformtoperformthe task itself.
Trajectories are typically recorded using kinesthetic
teaching[22,19,1], teleoperation[27,10,20]orgen-
erated in simulation [6, 18]. Given a set of demon-
strations, thesemethodsfindanappropriatemapping
in order to replicate the closest matching trajectory,
often making adaptations due to the variation be-
tween thecurrent anddemonstratedscenarios. Some
approaches represent the demonstrations as a set of
primitives by encoding the trajectories and then gen-
erating robotmotions throughprobabilisticmethods, e.g., Gaussian mixtures [5], Gaussian processes [22]
or dynamic movement primitives [19, 12]. This al-
lows for a more efficient search for the most appro-
priate trajectory to replicate.
More recent works apply deep neural net-
works to learn visuomotor policies that map in-
put images to robot trajectories through behavioral
cloning [27,20]. A network is trained on demon-
strations to learn the image-to-action mapping such
that a closed-loop controller commands the manip-
ulator through sequences of states to complete the
task. In this line of work, teleoperation is the pre-
ferredmethodtokinesthetic teachingbecausethehu-
mandoesnot contaminate the training images.
Extensions have been made that generalize the
models to multiple tasks, which allows few- or even
one-shot learning of new tasks [6, 26]. These meth-
odsapplymeta-learningtoefficientlyadapta learned
model, trained on many prior tasks, to a new task
that is to be imitated. James et al. [10] take a dif-
ferent approach and use metric learning to create
a task embedding. Imitating a new demonstration
is achieved by training a control network to trans-
late learned task embeddings into desired actions.
Huang et al. [9] propose neural task graphs to learn
the common structure of tasks and the conjugate re-
lationshipbetweenobservedstatesandactions.
Anotherdirectionofworkis tolearnbyusingonly
videosofhumansperformingtasks,e.g., [13,16,24].
However, human demonstrations do not provide suf-
ficient supervision for learning. Therefore, other
approaches explicitly learn the relationship between
human and robot demonstrations in order to directly
imitatehuman tasks in theonline setting [26].
In thiswork,webuildontheapproachesfor learn-
ing visuomotor policies through behavioral cloning.
Inparticular,weadapt themethodologypresentedby
Zhangetal. [27]byreplacing the teleoperationhard-
ware with a vision-based system. Our work is com-
plementary to it as well as to the extension that in-
corporates human demonstrations [26] by using our
teleoperationsystemasanalternative.
3. Approach
This section describes our approach for learning
from demonstration, an overview is given in Fig-
ure 2. For teleoperation, a webcam is used to track
thehand(Section3.1) togeneratepositions thatcon-
trol the robot’s end-effector (Section 3.2). During
the trajectory, the RGB-D images from a ceiling
43
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik