Seite - 120 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Bild der Seite - 120 -

Text der Seite - 120 -

4. ControlSystemDesign The only difference betweenQ-learning and sarsa is the definition of theTDerror,which inQ-learning is [Bar98] δtd=R(k)+γmax u′ Q(S(k+1),u′)−Q(S(k),U(k)). (4.49) ThisdifferencereflectstwodifferentupdateprinciplesofTDmeth- ods,whicharecalledon-policyandoff-policy. Sarsaisanon-policy algorithm because the update rule (equation 4.44) strictly follows thecontrolpolicypiandalldatausedinequation 4.44areactually experiencedbytheplant.Q-learningisanoff-policyalgorithmbe- causethetermmaxu′Q(S(k+1),u′)usedintheupdaterulemight not be the real experience of the plant. For example, if both sarsa andQ-learning are coupled with the greedy algorithm as the con- trol policy, there is no difference. But if they use the -greedy al- gorithm as the control policy, the performance will be different. Because inQ-learning the update still follows a greedy algorithm, while in sarsa the update strictly follows the -greedy algorithm andthe influenceofexplorationactionswillbereflectedintheup- date result. Compared betweenQ-learning and sarsa,Q-learning is in general more efficient because of the use of full knowledge rather thanfollowingthecontrolpolicy. • Actor-critic (AC)methods: AC methods are significantly different with sarsa andQ-learning [PS08b]. In both sarsa andQ-learning, the control policy is ob- tained based on the state-action value function, using stochastic searching algorithms such as the greedy algorithm or -greedy al- gorithm. Evaluation of the value function is the key of the en- tire control system and the control policy is only updated accord- ingly. This kind of algorithms is usually called critic-only algo- rithms(critic is thestructurewhere thevaluefunction isupdated). In AC methods, there are two separately structures which update thecontrolpolicyandthevaluefunctionindependently, calledthe controller (actor) and the critic respectively (such as figure 4.9). The value function is updated in the critic using the same method as inQ-learning or sarsa. Meanwhile, the control policy is also updated in the controller. Instead of the probability distribution directly inferred from the value function using equation 4.43, the 120

zurück zum Buch Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"

Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Titel: Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Autor: Yiming Sun
Verlag: KIT Scientific Publishing
Ort: Karlsruhe
Datum: 2016
Sprache: englisch
Lizenz: CC BY-SA 3.0
ISBN: 978-3-7315-0467-2
Abmessungen: 14.8 x 21.0 cm
Seiten: 260
Schlagwörter: Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
Kategorie: Technik