Seite - 111 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Bild der Seite - 111 -
Text der Seite - 111 -
4.2. IntelligentControl
inRL. Inmostcases theplant tobecontrolled iscompletelyunknown
tothelearner. Thereforethelearnerhastouseatrial-and-errorsearch-
ing strategy to consistently explore the environment by performing
different actions, and improve the action policy according to the re-
ceivedrewardsfromtheenvironment. RLismorepracticalanduseful
than supervised learning in many cases, because it is often impossi-
ble to obtain training datasets that are both correct and representative
to describe all dynamics of the real problem. That is also why RL is
considered as the closest learning approach to functionalities of the
humanbrain,andwidelyappliedandstudiedinartificial intelligence,
roboticsandcomputerscienceareas [GBLB12].
RL was extended and implemented in the control field starting from
the 1980s and 1990s [BSB81] [Sut84], and gradually developed as an
influential control method [SBW92] [WD92] [WMS92]. The main dif-
ferencebetweenRLcontrol (RLC)andothercontrolmethodsis that in
RLC a different tool is utilized to describe the dynamics of the plant
being controlled. Instead of transfer equations (polynomials) or state-
spacemodelsusedinconventionalcontrolmethods, inRLtheplant is
modeled by a Markov decision process (MDP) [Bel57] that only con-
sistsofdifferentstate-actionpairsandtransitionprobabilitiesbetween
any two states. The complete dynamics of the plant are described by
theprobabilitydistribution
Pa(s,s
′,r) =Pr(R(k+1) = r,S(k+1) =s′ |S(k) =s,A(k) =a),
(4.31)
wherePa(s,s′,r) is the transition probability from state s to state s′
witharewardr, causedbytheactionaat timek.
Fromaconventionalcontrolengineeringpointofview,theactionA(k)
isequivalent tothecontrol input(U(k))decidedbythecontroller,and
the stateS(k) is the basis for making control actions, which functions
similarly with the output variable (Y(k)). The rewardR(k) is the ba-
sis for evaluating the control actions, to tell a control action is good or
not,andithas thesamefunctionalitywiththecostdefinedbythecost
function (such as equation 4.30). In order to keep a good consistency
in matters of notations and descriptions, in the following the symbols
U(k) and u are utilized to replaceA(k) anda, as the selected control
action at time k and the random control action (vector), respectively.
111
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Titel
- Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Autor
- Yiming Sun
- Verlag
- KIT Scientific Publishing
- Ort
- Karlsruhe
- Datum
- 2016
- Sprache
- englisch
- Lizenz
- CC BY-SA 3.0
- ISBN
- 978-3-7315-0467-2
- Abmessungen
- 14.8 x 21.0 cm
- Seiten
- 260
- Schlagwörter
- Mikrowellenerwärmung, Mehrgrößenregelung, Modellprädiktive Regelung, Künstliches neuronales Netz, Bestärkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
- Kategorie
- Technik