Page - 119 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Image of the Page - 119 -
Text of the Page - 119 -
4.2. IntelligentControl
where |U(s)| is defined as the total number of available control ac-
tionsat thestatesand0≤ ≤1 is theprobability factorrepresent-
ing the totalprobability toselectothernon-greedycontrolactions.
The involvement of the probability factor is to achieve a trade-
off between exploration and exploitation, which is important for
not only sarsa but all TD methods [Thr92] [AN05] [ALL+09]. On
the one hand, the controller has to make the best action according
tothegreedyalgorithm(exploitation), tryingtoobtainahighlong-
termexpectedreward. Butontheotherhand,thecontrollershould
also occasionally explore other non-greedy actions to test if there
are better control actions. This is helpful for future improvements
of both the value function and the control policy, especially when
thevaluefunctionisnotoptimal,orwhentheplant isstochasticor
time-varying.
Thevalueof theprobability factor isdeterminedmainlydepend-
ing on the plant. In principle, the value of is large in the begin-
ning of the learning process and then gradually decreasing. For
deterministic plants, the value could eventually be zero after all
state-action pairs are experienced at least once, because the state-
actionvaluecanaccuratelylearnrewardsofindividualstate-action
pairsbyonlyonetimesearching. Butforstochasticortime-varying
plants, the value of should always keep a small but non-zero
value, in order to have more accurate estimations regarding ex-
ternaldisturbancesandtrack time-varyingdynamics.
Besides the -greedy algorithm, there are also several other algo-
rithms that aim to balance the trade-off between exploration and
exploitation, such as the -soft algorithm [Tho97] and softmax al-
gorithm[Bar98] [IYY02].
• Q-learning:
Theupdaterule for thestate-actionvaluefunctioninQ-learningis
definedas [Bar98]
Q(S(k),U(k)) =Q(S(k),U(k))+α(k)
[
R(k)
+γmax
u′ Q(S(k+1),u′)−Q(S(k),U(k)) ]
.
(4.48)
119
back to the
book Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Title
- Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Author
- Yiming Sun
- Publisher
- KIT Scientific Publishing
- Location
- Karlsruhe
- Date
- 2016
- Language
- English
- License
- CC BY-SA 3.0
- ISBN
- 978-3-7315-0467-2
- Size
- 14.8 x 21.0 cm
- Pages
- 260
- Keywords
- Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
- Category
- Technik