Page - 160 - in Short-Term Load Forecasting by Artificial Intelligent Technologies
Image of the Page - 160 -
Text of the Page - 160 -
Energies2018,11, 2038
thepredictionforanewobservation isgivenbythemeanof theresponsevaluesof the trainingdata
belongingto thesameregionas thenewobservation.
Thecriterion toconstruct theregionsor“boxes” is tominimize theresidual sumofsquares (RSS),
but not considering everypossible partitionof the feature space into J boxes because itwouldbe
computationally infeasible. Instead,arecursivebinarysplitting isused: ateachstep, thealgorithm
chooses thepredictor andcutpoint, such that the resulting treehas the lowestRSS.Theprocess is
repeateduntil astoppingcriterion is reached, see [28].
Let{(x1,y1),(x2,y2), . . . ,(xn,yn)}bethe trainingdataset,whereeachyidenotes the i-thoutput
(responsevariable)andxi = (xi1,xi2, . . . ,xis) thecorresponding inputof the“s”predictors (features)
instudy. Theobjective inaregressiontree is tofindboxesB1,B2, . . . ,Bj thatminimize theRSS,given
by(1):
J
∑
j= 1 ∑
i∈Bj (yi− yˆBj)2 (1)
where yˆBj is themeanresponse for the trainingobservationswithin the jthbox.
Aregressiontreecanbeconsideredasabase learner in thefieldofmachine learning. Themain
advantageof regressiontreesagainst lineal regressionmodels is that in thecaseofhighlynon-linear
and complex relationship between the features and the response, decision treesmay outperform
classical approaches. Althoughregression trees canbeverynon-robust andcangenerallyprovide
lesspredictiveaccuracy thansomeof theother regressionmethods, thesedrawbackscanbeeasily
improved by aggregatingmanydecision trees, usingmethods, such as bagging, random forests,
conditional forest, andboosting. These fourmethodshave in common that canbe considered as
ensemble learningmethods.
Anensemblemethod isaMachineLearningconcept inwhich the idea is tobuildaprediction
modelbycombiningacollectionof“N”simplerbase learners. Thesemethodsaredesignedtoreduce
bias andvariancewith respect to a single base learner. Some examples of ensemblemethods are
bagging, randomforest, conditional forest, andboosting.
2.1. Bagging
In thecaseofbagging(bootstrapaggregating), thecollectionof“N”base learners toensemble
isproducedbybootstrapsamplingonthe trainingdata. Fromtheoriginal trainingdataset,Nnew
trainingdatasetsareobtainedbyrandomsamplingwithreplacement,whereeachobservationhas the
sameprobability toappear in thenewdataset. Thepredictionofanewobservationwithbagging is
computedbyaveragingtheresponseof theN learners for thenewinput (ormajorityvote incaseof
classificationproblems). Inparticular,whenweapplybagging to regression trees, each individual
treehashighvariance,but lowbias.Averagingtheresultingpredictionof theseN treesreduces the
varianceandsubstantially improves inaccuracy(see [28]).
Theefficiencyof thebaggingmethoddependsonasuitableselectionof thenumberof treesN,
whichcanbeobtainedbyplotting theout-of-bag(OOB)errorestimationwithrespect toN.Note that
thebootstrapsamplingstepwithreplacement involves thateachobservationof theoriginal training
dataset is included in roughly two-thirdsof theNbagged treesand it isoutof the remainingones.
Then, thepredictionofeachobservationof theoriginal trainingdatasetcanbeobtainedbyaveraging
the predictions of the trees thatwere not fit using that observation. This is a simpleway, called
OOB, to get a valid estimate of the test error for the baggedmodel avoiding a validationdataset
orcross-validation.
Someotherparameters thatcanalsovaryare thenodesize (minimumnumberofobservations
of the terminalnodes,generallyfivebydefault)andthemaximumnumberof terminalnodes in the
forest (generally treesaregrowntothemaximumpossible, subject to limitsbynodesize).
In thispaper, thebaggingmethodhasbeenappliedbymeansof theRpackage“randomForest”,
see [28]. The package also includes twomeasures of predictor importance that help to quantify
160
Short-Term Load Forecasting by Artificial Intelligent Technologies
- Title
- Short-Term Load Forecasting by Artificial Intelligent Technologies
- Authors
- Wei-Chiang Hong
- Ming-Wei Li
- Guo-Feng Fan
- Editor
- MDPI
- Location
- Basel
- Date
- 2019
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-03897-583-0
- Size
- 17.0 x 24.4 cm
- Pages
- 448
- Keywords
- Scheduling Problems in Logistics, Transport, Timetabling, Sports, Healthcare, Engineering, Energy Management
- Category
- Informatik