Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Technik
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Page - 119 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 119 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Image of the Page - 119 -

Image of the Page - 119 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Text of the Page - 119 -

4.2. IntelligentControl where |U(s)| is defined as the total number of available control ac- tionsat thestatesand0≤ ≤1 is theprobability factorrepresent- ing the totalprobability toselectothernon-greedycontrolactions. The involvement of the probability factor is to achieve a trade- off between exploration and exploitation, which is important for not only sarsa but all TD methods [Thr92] [AN05] [ALL+09]. On the one hand, the controller has to make the best action according tothegreedyalgorithm(exploitation), tryingtoobtainahighlong- termexpectedreward. Butontheotherhand,thecontrollershould also occasionally explore other non-greedy actions to test if there are better control actions. This is helpful for future improvements of both the value function and the control policy, especially when thevaluefunctionisnotoptimal,orwhentheplant isstochasticor time-varying. Thevalueof theprobability factor isdeterminedmainlydepend- ing on the plant. In principle, the value of is large in the begin- ning of the learning process and then gradually decreasing. For deterministic plants, the value could eventually be zero after all state-action pairs are experienced at least once, because the state- actionvaluecanaccuratelylearnrewardsofindividualstate-action pairsbyonlyonetimesearching. Butforstochasticortime-varying plants, the value of should always keep a small but non-zero value, in order to have more accurate estimations regarding ex- ternaldisturbancesandtrack time-varyingdynamics. Besides the -greedy algorithm, there are also several other algo- rithms that aim to balance the trade-off between exploration and exploitation, such as the -soft algorithm [Tho97] and softmax al- gorithm[Bar98] [IYY02]. • Q-learning: Theupdaterule for thestate-actionvaluefunctioninQ-learningis definedas [Bar98] Q(S(k),U(k)) =Q(S(k),U(k))+α(k) [ R(k) +γmax u′ Q(S(k+1),u′)−Q(S(k),U(k)) ] . (4.48) 119
back to the  book Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Title
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Author
Yiming Sun
Publisher
KIT Scientific Publishing
Location
Karlsruhe
Date
2016
Language
English
License
CC BY-SA 3.0
ISBN
978-3-7315-0467-2
Size
14.8 x 21.0 cm
Pages
260
Keywords
Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
Category
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources