Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Tagungsbände
Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments
Page - 40 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 40 - in Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments

Image of the Page - 40 -

Image of the Page - 40 - in Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments

Text of the Page - 40 -

4.2. UnstructuredDataPreprocessing 4.2.1. SentenceSegmentation In order to judgewhether there are correspondingwords in the emotional dictionary in the sentence,we need to cut the sentence accurately intowords, namely the automatic segmentation of the sentence. After comparing the existing Word segmentation tools, considering the accuracy and the ease of use on the Python platform,we finally chose the JiebaChineseword segmentation [7] as ourword segmentation tool. The results of wordsegmentationexamplesare shown inTable4. 4.2.2. WordVectorization After sentencesegmentation,Word2Vec[8] isused toproducehigh-dimensionalvectors (WordEmbedding) to represent thewords, andconverts thesamples intowordsequence vectors. In the experiment, we do theword vectorization via calling the function gen- sim.models.word2vec,which takes thenews textas inputandproduces thewordvectors as output.And a sentence vector is the average of all thewords it contains. In this pro- cedure, thewhole text corpuswasmapped into a 300-dimensional vector space,where similarwordsarenearer thanothers. Table4. SentenceSegmentationExamples. Date SegmentationExample 2018/3/15 US dollar against the Canadian dollar rose above 1.3044 the highest in the last eight months 2018/3/16 Offshore Renminbi (CNH) was quoted at 6.3293 yuan against a US dollar at 04:59 Beijing time 4.3. StructuredDataPreprocessing 4.3.1. Imputation Sincetherearesomemissingvalues inthestructureddatasetof thispaper, it isacommon practice todeleteall therelevant rowsandcolumnsof thedata if there isamissingvalue, resulting in the consequence that important features lose easily. Therefore, imputation occupies a significant place in the preprocessing stage. This paper fills in themissing valueswith thepandas.DataFrame.fillna function, usingpad (padding themissingvalue with the previous non-missing value) and bfill (filling themissing valuewith the next non-missingvalue)modes. 4.3.2. DataNormalization Normalization is the standardized processing of all structured data to eliminate the di- mensional impact betweenvarious indicators.Thepurposeof this procedure is tomake the original data of the indicators in the sameorder ofmagnitudeunder comprehensive comparativeevaluation. In thispaper, theMin-Maxnormalizationmethod isused to lin- early transform the original data so that all the values aremapped between [0-1]. This Y.Duetal. /Predicting the InterbankCapitalAdequacyLevelBasedonFinancialDataAnalysis40
back to the  book Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments"
Intelligent Environments 2019 Workshop Proceedings of the 15th International Conference on Intelligent Environments
Title
Intelligent Environments 2019
Subtitle
Workshop Proceedings of the 15th International Conference on Intelligent Environments
Authors
Andrés Muñoz
Sofia Ouhbi
Wolfgang Minker
Loubna Echabbi
Miguel Navarro-Cía
Publisher
IOS Press BV
Date
2019
Language
German
License
CC BY-NC 4.0
ISBN
978-1-61499-983-6
Size
16.0 x 24.0 cm
Pages
416
Category
Tagungsbände
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Intelligent Environments 2019