Seite - 125 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Bild der Seite - 125 -
Text der Seite - 125 -
4.2. IntelligentControl
creteusingeitherdiscretization[DKS+95]orfuzzydescriptions[LJ00]
[Lin03], and then apply the lookup table based TD learning such as
Q-learning and sarsa. After some preliminary tests and comparisons,
the lookup table based Watkins’Q(λ)-learning [WD92] is selected as
theTDlearningcontrolmethod.
The aforementioned sarsa (equation 4.44) andQ-learning (equation
4.48) are the most basic one-step TD methods that only update the
value function of the current state or state-action pair based on the
next state or state-action pair. At each time only one state or state-
action value is updated and the overall convergence speed is limited.
In order to fully exploit the usefulness of each reward and speed up
the entire learning process, a more efficient TD(λ) learning method is
developed [Tes95] [WS98]. The parameter λ refers to the use of an
eligibility trace [LS98],which isnormallydefinedas
e(s,u) = {
γλe(s,u)+1, ifs,u is thecurrentstate-actionpair,
γλe(s,u), Otherwise,
(4.50)
where γ is the same as in equation 4.44 or 4.48 and 0 ≤ λ ≤ 1 is
the fading factor. Intuitively, the eligibility trace is considered as the
temporary memory of the occurrence of each state-action pair. Each
time when there is a reward, not only the current state-action pair but
also former implemented state-action pairs should be assigned credit.
The relevance between each state-action pair and the current reward
is adjusted using the fading factorλ, indicating the fact that the rele-
vanceisdecayingexponentiallyandthecurrentstate-actionpair takes
themaincredit.
The involvement of the eligibility trace makes the TD(λ) method a
combination of MC and pure TD (or TD(0)). When λ = 0, there is
no former state-action pair recorded and then the method is equiv-
alent to pure TD. When λ = 1, all former state-action pairs are
recorded and the memory never fades, which means all state-action
pairstakethesameweightofcredit fromthecurrentreward. Thenthe
methodbecomesaonlineversionofMCmethod. Whentheeligibility
trace is combined withQ-learning, there are mainly two different ap-
proaches, the so-called Watkins’Q(λ) and Peng’sQ(λ) [PW96]. Here
the Watkins’Q(λ) learning algorithm is selected because it is practi-
125
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Titel
- Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Autor
- Yiming Sun
- Verlag
- KIT Scientific Publishing
- Ort
- Karlsruhe
- Datum
- 2016
- Sprache
- englisch
- Lizenz
- CC BY-SA 3.0
- ISBN
- 978-3-7315-0467-2
- Abmessungen
- 14.8 x 21.0 cm
- Seiten
- 260
- Schlagwörter
- Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
- Kategorie
- Technik