Seite - 242 - in Short-Term Load Forecasting by Artificial Intelligent Technologies
Bild der Seite - 242 -
Text der Seite - 242 -
Energies2018,11, 1893
for larger sizes (tensofmillions). Ofcourseall theseconsiderationsdependheavilyon thespecific
materialandtechnology.Werecall thatour interest isonrelativelystandardscientificworkstations.
Thealgorithmweuseonthefirst stepof theclustering isdescribedbelow.Wethenshowtheresultsof
theprofilingofourwholestrategytohighlightwhereare thebottleneckswhenonewishes toup-scale
themethod.Weendthissectiondiscussingthesolutionsweproposed.
6.1.AlgorithmDescription
Themassivedatasetclusteringalgorithmisas follows:
1. Dataserialization.Timeseriesaregiven inaverboseby-columnformat.Were-codeallof themin
abinaryfile (if suitable),oradatabase.
2. Dimensionality reduction.Eachseriesof lengthN is replacedbythe log2(N)energeticcoefficients
definedusingawaveletbasis. Eventuallya featureselectionstepcanbeperformedto further
reductiononthenumberof features.
3. Chunking.Data ischunkedintogroupsofsizeatmostnc,wherenc isauserparameter (weuse
nc=5000 in thenextsectionexperiments).
4. Clustering.Withineachgroup, thePAMclusteringalgorithmisruntoobtainK0 clusters.
5. Gathering. Afinal runofPAMisperformed toobtainK′mediods,K′ noutof thenc×K0
mediodsobtainedonthechunks..
FromtheseK′medoids thesynchronecurvesarecomputed(i.e., thesumofall curveswithineach
groupforeachtimestep),andgivenonoutput for thepredictionstep.
6.2. CodeProfiling
Figure 9 gives some timings obtainedbyprofiling the runs of our initial (C) code. Togive a
clearer insight,wealsoreport thesizeof theobjectswedealwith. Thestartingpoint is theensembleof
individual recordsofelectricitydemandforawholeyear.Here,wetreatover25,000clientssampled
half-hourlyduringayear. The tabulationof thesedata toobtainamatrix representationsuitable tofit
inmemorytakeabout7min. andrequiresover30Gbofmemory.
Task Time Memory Disk
Raw(15Gb) tomatrix 7min 30Gb 2.7Gb
Computecontributions 7min <1Gb 7Mb
1ststageclustering 3min <1Gb –
Aggregation 1min 6Gb 30Mb
Werdistancematrix 40min 64Gb 150Kb
Forecasts 10min <1Gb –
Figure9. Codeprofilingbytasks.
6.3. ProposedSolutions
Twomainsolutionsare tobediscussed, concerningthe internaldatastoragestrategyandtheuse
ofasimpleparallelizationscheme. Theformer looks for reducingthecommunicationtimeof internal
operationsusing serialization. The latter attacks themajor bottleneck of our clustering approach,
that is theconstructionof theWERdissimilaritymatrix.
The initial format (verbose, by-column) is clearly inappropriate for efficient data processing.
Thereareseveraloptionsstartingfromthisdata format, they implyhavingall seriesstoredas
• anASCIIfile,onesampleper line;very fast,butdataretrievalwilldependonlinenumber;
• abinaryformat (3or4octetspervalue); compression isunadvisedsince itwould increaseboth
preprocessingtimeand(bya largeamount) readingtimes;
242
Short-Term Load Forecasting by Artificial Intelligent Technologies
- Titel
- Short-Term Load Forecasting by Artificial Intelligent Technologies
- Autoren
- Wei-Chiang Hong
- Ming-Wei Li
- Guo-Feng Fan
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2019
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-03897-583-0
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 448
- Schlagwörter
- Scheduling Problems in Logistics, Transport, Timetabling, Sports, Healthcare, Engineering, Energy Management
- Kategorie
- Informatik