Seite - 120 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Bild der Seite - 120 -
Text der Seite - 120 -
4. ControlSystemDesign
The only difference betweenQ-learning and sarsa is the definition
of theTDerror,which inQ-learning is [Bar98]
δtd=R(k)+γmax
u′ Q(S(k+1),u′)−Q(S(k),U(k)). (4.49)
ThisdifferencereflectstwodifferentupdateprinciplesofTDmeth-
ods,whicharecalledon-policyandoff-policy. Sarsaisanon-policy
algorithm because the update rule (equation 4.44) strictly follows
thecontrolpolicypiandalldatausedinequation 4.44areactually
experiencedbytheplant.Q-learningisanoff-policyalgorithmbe-
causethetermmaxu′Q(S(k+1),u′)usedintheupdaterulemight
not be the real experience of the plant. For example, if both sarsa
andQ-learning are coupled with the greedy algorithm as the con-
trol policy, there is no difference. But if they use the -greedy al-
gorithm as the control policy, the performance will be different.
Because inQ-learning the update still follows a greedy algorithm,
while in sarsa the update strictly follows the -greedy algorithm
andthe influenceofexplorationactionswillbereflectedintheup-
date result. Compared betweenQ-learning and sarsa,Q-learning
is in general more efficient because of the use of full knowledge
rather thanfollowingthecontrolpolicy.
• Actor-critic (AC)methods:
AC methods are significantly different with sarsa andQ-learning
[PS08b]. In both sarsa andQ-learning, the control policy is ob-
tained based on the state-action value function, using stochastic
searching algorithms such as the greedy algorithm or -greedy al-
gorithm. Evaluation of the value function is the key of the en-
tire control system and the control policy is only updated accord-
ingly. This kind of algorithms is usually called critic-only algo-
rithms(critic is thestructurewhere thevaluefunction isupdated).
In AC methods, there are two separately structures which update
thecontrolpolicyandthevaluefunctionindependently, calledthe
controller (actor) and the critic respectively (such as figure 4.9).
The value function is updated in the critic using the same method
as inQ-learning or sarsa. Meanwhile, the control policy is also
updated in the controller. Instead of the probability distribution
directly inferred from the value function using equation 4.43, the
120
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Titel
- Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Autor
- Yiming Sun
- Verlag
- KIT Scientific Publishing
- Ort
- Karlsruhe
- Datum
- 2016
- Sprache
- englisch
- Lizenz
- CC BY-SA 3.0
- ISBN
- 978-3-7315-0467-2
- Abmessungen
- 14.8 x 21.0 cm
- Seiten
- 260
- Schlagwörter
- Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
- Kategorie
- Technik