Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Technik
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Seite - 120 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 120 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Bild der Seite - 120 -

Bild der Seite - 120 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Text der Seite - 120 -

4. ControlSystemDesign The only difference betweenQ-learning and sarsa is the definition of theTDerror,which inQ-learning is [Bar98] δtd=R(k)+γmax u′ Q(S(k+1),u′)−Q(S(k),U(k)). (4.49) ThisdifferencereflectstwodifferentupdateprinciplesofTDmeth- ods,whicharecalledon-policyandoff-policy. Sarsaisanon-policy algorithm because the update rule (equation 4.44) strictly follows thecontrolpolicypiandalldatausedinequation 4.44areactually experiencedbytheplant.Q-learningisanoff-policyalgorithmbe- causethetermmaxu′Q(S(k+1),u′)usedintheupdaterulemight not be the real experience of the plant. For example, if both sarsa andQ-learning are coupled with the greedy algorithm as the con- trol policy, there is no difference. But if they use the -greedy al- gorithm as the control policy, the performance will be different. Because inQ-learning the update still follows a greedy algorithm, while in sarsa the update strictly follows the -greedy algorithm andthe influenceofexplorationactionswillbereflectedintheup- date result. Compared betweenQ-learning and sarsa,Q-learning is in general more efficient because of the use of full knowledge rather thanfollowingthecontrolpolicy. • Actor-critic (AC)methods: AC methods are significantly different with sarsa andQ-learning [PS08b]. In both sarsa andQ-learning, the control policy is ob- tained based on the state-action value function, using stochastic searching algorithms such as the greedy algorithm or -greedy al- gorithm. Evaluation of the value function is the key of the en- tire control system and the control policy is only updated accord- ingly. This kind of algorithms is usually called critic-only algo- rithms(critic is thestructurewhere thevaluefunction isupdated). In AC methods, there are two separately structures which update thecontrolpolicyandthevaluefunctionindependently, calledthe controller (actor) and the critic respectively (such as figure 4.9). The value function is updated in the critic using the same method as inQ-learning or sarsa. Meanwhile, the control policy is also updated in the controller. Instead of the probability distribution directly inferred from the value function using equation 4.43, the 120
zurück zum  Buch Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Titel
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Autor
Yiming Sun
Verlag
KIT Scientific Publishing
Ort
Karlsruhe
Datum
2016
Sprache
englisch
Lizenz
CC BY-SA 3.0
ISBN
978-3-7315-0467-2
Abmessungen
14.8 x 21.0 cm
Seiten
260
Schlagwörter
Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
Kategorie
Technik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources