Page - 116 - in Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
Image of the Page - 116 -
Text of the Page - 116 -
4. ControlSystemDesign
or 4.42 . In other words, any control policy that is greedy [ZSWM00]
[RS07] with respect to the optimal state-action value function 4.42 is
anoptimalpolicy [Bar98], suchas
piâ(s,u) : u= uâ= argmax
u Qâ(s,u), (4.43)
whereuâ indicates theoptimalcontrolaction.
Modelof theplant
The function of the model is to mimic the dynamics of the plant and
predict future states and rewards. It is not a compulsory element for
all RLC methods. Classical RLC methods use the pure trial-and-error
strategy, where all of their learning and approximations are based on
explicitly experienced interactions, and the model of the plant is not
needed. InmodernRLCmethods,planningbasedonthemodelof the
plant is often utilized to speed up the learning process [Die99]. Be-
sides real experiences, simulated experiences from the model of the
plant is also useful to improve and update the control policy [AR01].
AlthoughRLCwiththeincremental/onlineplanningcostsmorecom-
putation power than the direct RLC, it showed in [AS97] and [LV09]
that a faster learning speed and a higher expected return can be ob-
tainedbecauseof the involvementofexplicitmodels.
Several points are worthy of note regarding these four parts. Firstly,
RL is able to deal with different types of problems and control tasks,
as longasthevaluefunctionisclearlydefinedandrewardsareappro-
priately assigned. From this point of view, the task (equation 4.30)
canbesolvedusingRL.Secondly, Inpracticalcontrolapplications, the
state-actionvaluefunction 4.35makesmoresensethanthestatevalue
function 4.34. Because as shown in equation 4.43, the state-action
value function directly defines how good or bad a control action is,
andtheoptimalcontrollercanbesimplyconstructedusingthegreedy
algorithm, which makes the controller design more straightforward
than using the state value function. Due to this reason, all following
RLmethodsareintroducedinthethestate-actionvaluefunctionform.
Finally, as previously emphasized, the value function is the core of all
RLCmethods. Usingdifferentapproachestocalculateorapproximate
116
back to the
book Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources"
Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Title
- Adaptive and Intelligent Temperature Control of Microwave Heating Systems with Multiple Sources
- Author
- Yiming Sun
- Publisher
- KIT Scientific Publishing
- Location
- Karlsruhe
- Date
- 2016
- Language
- English
- License
- CC BY-SA 3.0
- ISBN
- 978-3-7315-0467-2
- Size
- 14.8 x 21.0 cm
- Pages
- 260
- Keywords
- MikrowellenerwĂ€rmung, MehrgröĂenregelung, ModellprĂ€diktive Regelung, KĂŒnstliches neuronales Netz, BestĂ€rkendes Lernenmicrowave heating, multiple-input multiple-output (MIMO), model predictive control (MPC), neural network, reinforcement learning
- Category
- Technik