Page - 160 - in Short-Term Load Forecasting by Artificial Intelligent Technologies

Image of the Page - 160 -

Text of the Page - 160 -

Energies2018,11, 2038 thepredictionforanewobservation isgivenbythemeanof theresponsevaluesof the trainingdata belongingto thesameregionas thenewobservation. Thecriterion toconstruct theregionsor“boxes” is tominimize theresidual sumofsquares (RSS), but not considering everypossible partitionof the feature space into J boxes because itwouldbe computationally infeasible. Instead,arecursivebinarysplitting isused: ateachstep, thealgorithm chooses thepredictor andcutpoint, such that the resulting treehas the lowestRSS.Theprocess is repeateduntil astoppingcriterion is reached, see [28]. Let{(x1,y1),(x2,y2), . . . ,(xn,yn)}bethe trainingdataset,whereeachyidenotes the i-thoutput (responsevariable)andxi = (xi1,xi2, . . . ,xis) thecorresponding inputof the“s”predictors (features) instudy. Theobjective inaregressiontree is toﬁndboxesB1,B2, . . . ,Bj thatminimize theRSS,given by(1): J ∑ j= 1 ∑ i∈Bj (yi− yˆBj)2 (1) where yˆBj is themeanresponse for the trainingobservationswithin the jthbox. Aregressiontreecanbeconsideredasabase learner in theﬁeldofmachine learning. Themain advantageof regressiontreesagainst lineal regressionmodels is that in thecaseofhighlynon-linear and complex relationship between the features and the response, decision treesmay outperform classical approaches. Althoughregression trees canbeverynon-robust andcangenerallyprovide lesspredictiveaccuracy thansomeof theother regressionmethods, thesedrawbackscanbeeasily improved by aggregatingmanydecision trees, usingmethods, such as bagging, random forests, conditional forest, andboosting. These fourmethodshave in common that canbe considered as ensemble learningmethods. Anensemblemethod isaMachineLearningconcept inwhich the idea is tobuildaprediction modelbycombiningacollectionof“N”simplerbase learners. Thesemethodsaredesignedtoreduce bias andvariancewith respect to a single base learner. Some examples of ensemblemethods are bagging, randomforest, conditional forest, andboosting. 2.1. Bagging In thecaseofbagging(bootstrapaggregating), thecollectionof“N”base learners toensemble isproducedbybootstrapsamplingonthe trainingdata. Fromtheoriginal trainingdataset,Nnew trainingdatasetsareobtainedbyrandomsamplingwithreplacement,whereeachobservationhas the sameprobability toappear in thenewdataset. Thepredictionofanewobservationwithbagging is computedbyaveragingtheresponseof theN learners for thenewinput (ormajorityvote incaseof classiﬁcationproblems). Inparticular,whenweapplybagging to regression trees, each individual treehashighvariance,but lowbias.Averagingtheresultingpredictionof theseN treesreduces the varianceandsubstantially improves inaccuracy(see [28]). Theefﬁciencyof thebaggingmethoddependsonasuitableselectionof thenumberof treesN, whichcanbeobtainedbyplotting theout-of-bag(OOB)errorestimationwithrespect toN.Note that thebootstrapsamplingstepwithreplacement involves thateachobservationof theoriginal training dataset is included in roughly two-thirdsof theNbagged treesand it isoutof the remainingones. Then, thepredictionofeachobservationof theoriginal trainingdatasetcanbeobtainedbyaveraging the predictions of the trees thatwere not ﬁt using that observation. This is a simpleway, called OOB, to get a valid estimate of the test error for the baggedmodel avoiding a validationdataset orcross-validation. Someotherparameters thatcanalsovaryare thenodesize (minimumnumberofobservations of the terminalnodes,generallyﬁvebydefault)andthemaximumnumberof terminalnodes in the forest (generally treesaregrowntothemaximumpossible, subject to limitsbynodesize). In thispaper, thebaggingmethodhasbeenappliedbymeansof theRpackage“randomForest”, see [28]. The package also includes twomeasures of predictor importance that help to quantify 160

back to the book Short-Term Load Forecasting by Artificial Intelligent Technologies"

Short-Term Load Forecasting by Artificial Intelligent Technologies

Title: Short-Term Load Forecasting by Artificial Intelligent Technologies
Authors: Wei-Chiang Hong; Ming-Wei Li; Guo-Feng Fan
Editor: MDPI
Location: Basel
Date: 2019
Language: English
License: CC BY 4.0
ISBN: 978-3-03897-583-0
Size: 17.0 x 24.4 cm
Pages: 448
Keywords: Scheduling Problems in Logistics, Transport, Timetabling, Sports, Healthcare, Engineering, Energy Management
Category: Informatik