Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Technik
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Page - 120 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 120 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Image of the Page - 120 -

Image of the Page - 120 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Text of the Page - 120 -

4. ControlSystemDesign The only difference betweenQ-learning and sarsa is the definition of theTDerror,which inQ-learning is [Bar98] δtd=R(k)+γmax u′ Q(S(k+1),u′)−Q(S(k),U(k)). (4.49) ThisdifferencereflectstwodifferentupdateprinciplesofTDmeth- ods,whicharecalledon-policyandoff-policy. Sarsaisanon-policy algorithm because the update rule (equation 4.44) strictly follows thecontrolpolicypiandalldatausedinequation 4.44areactually experiencedbytheplant.Q-learningisanoff-policyalgorithmbe- causethetermmaxu′Q(S(k+1),u′)usedintheupdaterulemight not be the real experience of the plant. For example, if both sarsa andQ-learning are coupled with the greedy algorithm as the con- trol policy, there is no difference. But if they use the -greedy al- gorithm as the control policy, the performance will be different. Because inQ-learning the update still follows a greedy algorithm, while in sarsa the update strictly follows the -greedy algorithm andthe influenceofexplorationactionswillbereflectedintheup- date result. Compared betweenQ-learning and sarsa,Q-learning is in general more efficient because of the use of full knowledge rather thanfollowingthecontrolpolicy. • Actor-critic (AC)methods: AC methods are significantly different with sarsa andQ-learning [PS08b]. In both sarsa andQ-learning, the control policy is ob- tained based on the state-action value function, using stochastic searching algorithms such as the greedy algorithm or -greedy al- gorithm. Evaluation of the value function is the key of the en- tire control system and the control policy is only updated accord- ingly. This kind of algorithms is usually called critic-only algo- rithms(critic is thestructurewhere thevaluefunction isupdated). In AC methods, there are two separately structures which update thecontrolpolicyandthevaluefunctionindependently, calledthe controller (actor) and the critic respectively (such as figure 4.9). The value function is updated in the critic using the same method as inQ-learning or sarsa. Meanwhile, the control policy is also updated in the controller. Instead of the probability distribution directly inferred from the value function using equation 4.43, the 120
back to the  book Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Title
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Author
Yiming Sun
Publisher
KIT Scientific Publishing
Location
Karlsruhe
Date
2016
Language
English
License
CC BY-SA 3.0
ISBN
978-3-7315-0467-2
Size
14.8 x 21.0 cm
Pages
260
Keywords
Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
Category
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources