Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Technik
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Seite - 125 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 125 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Bild der Seite - 125 -

Bild der Seite - 125 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Text der Seite - 125 -

4.2. IntelligentControl creteusingeitherdiscretization[DKS+95]orfuzzydescriptions[LJ00] [Lin03], and then apply the lookup table based TD learning such as Q-learning and sarsa. After some preliminary tests and comparisons, the lookup table based Watkins’Q(λ)-learning [WD92] is selected as theTDlearningcontrolmethod. The aforementioned sarsa (equation 4.44) andQ-learning (equation 4.48) are the most basic one-step TD methods that only update the value function of the current state or state-action pair based on the next state or state-action pair. At each time only one state or state- action value is updated and the overall convergence speed is limited. In order to fully exploit the usefulness of each reward and speed up the entire learning process, a more efficient TD(λ) learning method is developed [Tes95] [WS98]. The parameter λ refers to the use of an eligibility trace [LS98],which isnormallydefinedas e(s,u) = { γλe(s,u)+1, ifs,u is thecurrentstate-actionpair, γλe(s,u), Otherwise, (4.50) where γ is the same as in equation 4.44 or 4.48 and 0 ≤ λ ≤ 1 is the fading factor. Intuitively, the eligibility trace is considered as the temporary memory of the occurrence of each state-action pair. Each time when there is a reward, not only the current state-action pair but also former implemented state-action pairs should be assigned credit. The relevance between each state-action pair and the current reward is adjusted using the fading factorλ, indicating the fact that the rele- vanceisdecayingexponentiallyandthecurrentstate-actionpair takes themaincredit. The involvement of the eligibility trace makes the TD(λ) method a combination of MC and pure TD (or TD(0)). When λ = 0, there is no former state-action pair recorded and then the method is equiv- alent to pure TD. When λ = 1, all former state-action pairs are recorded and the memory never fades, which means all state-action pairstakethesameweightofcredit fromthecurrentreward. Thenthe methodbecomesaonlineversionofMCmethod. Whentheeligibility trace is combined withQ-learning, there are mainly two different ap- proaches, the so-called Watkins’Q(λ) and Peng’sQ(λ) [PW96]. Here the Watkins’Q(λ) learning algorithm is selected because it is practi- 125
zurück zum  Buch Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Titel
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Autor
Yiming Sun
Verlag
KIT Scientific Publishing
Ort
Karlsruhe
Datum
2016
Sprache
englisch
Lizenz
CC BY-SA 3.0
ISBN
978-3-7315-0467-2
Abmessungen
14.8 x 21.0 cm
Seiten
260
Schlagwörter
Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
Kategorie
Technik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources