Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Technik
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Seite - 119 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 119 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Bild der Seite - 119 -

Bild der Seite - 119 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Text der Seite - 119 -

4.2. IntelligentControl where |U(s)| is defined as the total number of available control ac- tionsat thestatesand0≤ ≤1 is theprobability factorrepresent- ing the totalprobability toselectothernon-greedycontrolactions. The involvement of the probability factor is to achieve a trade- off between exploration and exploitation, which is important for not only sarsa but all TD methods [Thr92] [AN05] [ALL+09]. On the one hand, the controller has to make the best action according tothegreedyalgorithm(exploitation), tryingtoobtainahighlong- termexpectedreward. Butontheotherhand,thecontrollershould also occasionally explore other non-greedy actions to test if there are better control actions. This is helpful for future improvements of both the value function and the control policy, especially when thevaluefunctionisnotoptimal,orwhentheplant isstochasticor time-varying. Thevalueof theprobability factor isdeterminedmainlydepend- ing on the plant. In principle, the value of is large in the begin- ning of the learning process and then gradually decreasing. For deterministic plants, the value could eventually be zero after all state-action pairs are experienced at least once, because the state- actionvaluecanaccuratelylearnrewardsofindividualstate-action pairsbyonlyonetimesearching. Butforstochasticortime-varying plants, the value of should always keep a small but non-zero value, in order to have more accurate estimations regarding ex- ternaldisturbancesandtrack time-varyingdynamics. Besides the -greedy algorithm, there are also several other algo- rithms that aim to balance the trade-off between exploration and exploitation, such as the -soft algorithm [Tho97] and softmax al- gorithm[Bar98] [IYY02]. • Q-learning: Theupdaterule for thestate-actionvaluefunctioninQ-learningis definedas [Bar98] Q(S(k),U(k)) =Q(S(k),U(k))+α(k) [ R(k) +γmax u′ Q(S(k+1),u′)−Q(S(k),U(k)) ] . (4.48) 119
zurück zum  Buch Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Titel
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Autor
Yiming Sun
Verlag
KIT Scientific Publishing
Ort
Karlsruhe
Datum
2016
Sprache
englisch
Lizenz
CC BY-SA 3.0
ISBN
978-3-7315-0467-2
Abmessungen
14.8 x 21.0 cm
Seiten
260
Schlagwörter
Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
Kategorie
Technik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources