Seite - 40 - in Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments

Bild der Seite - 40 -

Text der Seite - 40 -

4.2. UnstructuredDataPreprocessing 4.2.1. SentenceSegmentation In order to judgewhether there are correspondingwords in the emotional dictionary in the sentence,we need to cut the sentence accurately intowords, namely the automatic segmentation of the sentence. After comparing the existing Word segmentation tools, considering the accuracy and the ease of use on the Python platform,we ﬁnally chose the JiebaChineseword segmentation [7] as ourword segmentation tool. The results of wordsegmentationexamplesare shown inTable4. 4.2.2. WordVectorization After sentencesegmentation,Word2Vec[8] isused toproducehigh-dimensionalvectors (WordEmbedding) to represent thewords, andconverts thesamples intowordsequence vectors. In the experiment, we do theword vectorization via calling the function gen- sim.models.word2vec,which takes thenews textas inputandproduces thewordvectors as output.And a sentence vector is the average of all thewords it contains. In this pro- cedure, thewhole text corpuswasmapped into a 300-dimensional vector space,where similarwordsarenearer thanothers. Table4. SentenceSegmentationExamples. Date SegmentationExample 2018/3/15 US dollar against the Canadian dollar rose above 1.3044 the highest in the last eight months 2018/3/16 Offshore Renminbi (CNH) was quoted at 6.3293 yuan against a US dollar at 04:59 Beijing time 4.3. StructuredDataPreprocessing 4.3.1. Imputation Sincetherearesomemissingvalues inthestructureddatasetof thispaper, it isacommon practice todeleteall therelevant rowsandcolumnsof thedata if there isamissingvalue, resulting in the consequence that important features lose easily. Therefore, imputation occupies a signiﬁcant place in the preprocessing stage. This paper ﬁlls in themissing valueswith thepandas.DataFrame.ﬁllna function, usingpad (padding themissingvalue with the previous non-missing value) and bﬁll (ﬁlling themissing valuewith the next non-missingvalue)modes. 4.3.2. DataNormalization Normalization is the standardized processing of all structured data to eliminate the di- mensional impact betweenvarious indicators.Thepurposeof this procedure is tomake the original data of the indicators in the sameorder ofmagnitudeunder comprehensive comparativeevaluation. In thispaper, theMin-Maxnormalizationmethod isused to lin- early transform the original data so that all the values aremapped between [0-1]. This Y.Duetal. /Predicting the InterbankCapitalAdequacyLevelBasedonFinancialDataAnalysis40

zurück zum Buch Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments"

Intelligent Environments 2019 Workshop Proceedings of the 15th International Conference on Intelligent Environments

Titel: Intelligent Environments 2019
Untertitel: Workshop Proceedings of the 15th International Conference on Intelligent Environments
Autoren: Andrés Muñoz; Sofia Ouhbi; Wolfgang Minker; Loubna Echabbi; Miguel Navarro-Cía
Verlag: IOS Press BV
Datum: 2019
Sprache: deutsch
Lizenz: CC BY-NC 4.0
ISBN: 978-1-61499-983-6
Abmessungen: 16.0 x 24.0 cm
Seiten: 416
Kategorie: Tagungsbände