Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Technik
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Page - 125 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 125 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Image of the Page - 125 -

Image of the Page - 125 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources

Text of the Page - 125 -

4.2. IntelligentControl creteusingeitherdiscretization[DKS+95]orfuzzydescriptions[LJ00] [Lin03], and then apply the lookup table based TD learning such as Q-learning and sarsa. After some preliminary tests and comparisons, the lookup table based Watkins’Q(λ)-learning [WD92] is selected as theTDlearningcontrolmethod. The aforementioned sarsa (equation 4.44) andQ-learning (equation 4.48) are the most basic one-step TD methods that only update the value function of the current state or state-action pair based on the next state or state-action pair. At each time only one state or state- action value is updated and the overall convergence speed is limited. In order to fully exploit the usefulness of each reward and speed up the entire learning process, a more efficient TD(λ) learning method is developed [Tes95] [WS98]. The parameter λ refers to the use of an eligibility trace [LS98],which isnormallydefinedas e(s,u) = { γλe(s,u)+1, ifs,u is thecurrentstate-actionpair, γλe(s,u), Otherwise, (4.50) where γ is the same as in equation 4.44 or 4.48 and 0 ≤ λ ≤ 1 is the fading factor. Intuitively, the eligibility trace is considered as the temporary memory of the occurrence of each state-action pair. Each time when there is a reward, not only the current state-action pair but also former implemented state-action pairs should be assigned credit. The relevance between each state-action pair and the current reward is adjusted using the fading factorλ, indicating the fact that the rele- vanceisdecayingexponentiallyandthecurrentstate-actionpair takes themaincredit. The involvement of the eligibility trace makes the TD(λ) method a combination of MC and pure TD (or TD(0)). When λ = 0, there is no former state-action pair recorded and then the method is equiv- alent to pure TD. When λ = 1, all former state-action pairs are recorded and the memory never fades, which means all state-action pairstakethesameweightofcredit fromthecurrentreward. Thenthe methodbecomesaonlineversionofMCmethod. Whentheeligibility trace is combined withQ-learning, there are mainly two different ap- proaches, the so-called Watkins’Q(λ) and Peng’sQ(λ) [PW96]. Here the Watkins’Q(λ) learning algorithm is selected because it is practi- 125
back to the  book Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Title
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Author
Yiming Sun
Publisher
KIT Scientific Publishing
Location
Karlsruhe
Date
2016
Language
English
License
CC BY-SA 3.0
ISBN
978-3-7315-0467-2
Size
14.8 x 21.0 cm
Pages
260
Keywords
Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
Category
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources