Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Seite - 29 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 29 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 29 -

Bild der Seite - 29 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text der Seite - 29 -

Howdoesexplicit exploration influenceDeepReinforcementLearning? JakobJ.Hollenstein,ErwanRenaudo,MatteoSaveriano, JustusPiater Universityof Innsbruck {jakob.hollenstein,erwan.renaudo,matteo.saveriano,justus.piater}@uibk.ac.at Abstract. Most Deep Reinforcement Learning (D- RL)methods perform local searchand therefore are prone toget stuck innon-optimal solutions. Toover- come this issue, we exploit simulation models and kinodynamic planners as explorationmechanism in a model-based reinforcement learning method. We showthat, evenonasimple toydomain,D-RLmeth- ods are not immune to local optimaand require ad- ditional exploration mechanisms. In contrast, our planning-based exploration exhibits a better state space coveragewhich turns intobetter policies than theones learnedvia standardD-RLmethods. 1. Introduction Deep-Reinforcement Learning (D-RL) has shown promising results in challenging robotics domains (e.g. [4]), but can be resource demanding and diffi- cult to train. We assume that part of the difficulty of learning good policies is related to insufficient ex- ploration. Other D-RL methods like [1,3,6] par- tially address the problem by increasing the number of training steps, or by relying on the environment implementation to provide exploring-starts to cover a diverse enough state-space region. However, these solutions are impractical and potentially dangerous in robotics applications. In the robotic context, directed exploration via physically-basedsimulationappearsmorepromising tofindgoodsolutionsmore reliablyand in less time. Therefore, this work proposes thePlanning forPol- icySearch(PPS)methodthatexploitsakinodynamic planner intheexplorationphasetocollectdatawhich are then used to learn a policy, thereby eliminating the planning time during execution. PPS is tested on This research has received funding from the European Union’s Horizon 2020 research and innovation programme (grant agreementno. 731761, IMAGINE) Dynamics position velocity G1 Positionwrapping d1 G2 d2 M X= [ x x˙ ] A= [ 0 1 0 0 ] B= [ 0 1 ] x˙=Ax+Bu Reward max((1−tanh|X−G∗1|),2(1−tanh|X−G∗2|)) G1= [ −2.5 0.0 ] G2= [ 6.0 0.0 ] Limits u∈ [−1;1] x∈ [−10;10] x˙∈ [−2.5;2.5] Table 1. Description of the 1D double-integrator test en- vironment: a point mass M can be moved in a one- dimensional spaceposition-velocityX=[x,x˙]byapply- ing a continuous-valued force. Reward is received based on thedistance to two possiblegoal locations (G1,G2). Figure1. IllustrationofPPSMethod thepointmasssystemdescribed inTable1andcom- paredwithD-RLapproaches. 2.Planning forPolicySearch The presented PPS implementation (Figure 1) consists of a Linear Quadratic Regulator (LQR)- Rapidly Exploring Random Tree (RRT) [5] to cre- ate a tree of dataD= {(s,a,r,s′), .. .} from which Soft-Actor Critic (SAC) [1] learns a policy. In contrast to [5] quadratic programming-based finite- horizon steering is used to extend the tree. In our setup, all the environment interaction data created by RRT are used as training data for the policy rather than using only successful trajectories as ex- pertdemonstrations. 3.Evaluation PPS is evaluated in the one-dimensional goal reaching task presented in Table 1. The environment contains two distinct goal locations. The agent re- ceives a reward based on the distance to the goal 29
zurück zum  Buch Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Titel
Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber
Graz University of Technology
Ort
Graz
Datum
2020
Sprache
englisch
Lizenz
CC BY 4.0
ISBN
978-3-85125-752-6
Abmessungen
21.0 x 29.7 cm
Seiten
188
Kategorien
Informatik
Technik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020