Seite - 40 - in Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments
Bild der Seite - 40 -
Text der Seite - 40 -
4.2. UnstructuredDataPreprocessing
4.2.1. SentenceSegmentation
In order to judgewhether there are correspondingwords in the emotional dictionary in
the sentence,we need to cut the sentence accurately intowords, namely the automatic
segmentation of the sentence. After comparing the existing Word segmentation tools,
considering the accuracy and the ease of use on the Python platform,we finally chose
the JiebaChineseword segmentation [7] as ourword segmentation tool. The results of
wordsegmentationexamplesare shown inTable4.
4.2.2. WordVectorization
After sentencesegmentation,Word2Vec[8] isused toproducehigh-dimensionalvectors
(WordEmbedding) to represent thewords, andconverts thesamples intowordsequence
vectors. In the experiment, we do theword vectorization via calling the function gen-
sim.models.word2vec,which takes thenews textas inputandproduces thewordvectors
as output.And a sentence vector is the average of all thewords it contains. In this pro-
cedure, thewhole text corpuswasmapped into a 300-dimensional vector space,where
similarwordsarenearer thanothers.
Table4. SentenceSegmentationExamples.
Date SegmentationExample
2018/3/15 US dollar against the Canadian dollar rose above 1.3044 the highest
in the last eight months
2018/3/16 Offshore Renminbi (CNH) was quoted at 6.3293 yuan against a US
dollar at 04:59 Beijing time
4.3. StructuredDataPreprocessing
4.3.1. Imputation
Sincetherearesomemissingvalues inthestructureddatasetof thispaper, it isacommon
practice todeleteall therelevant rowsandcolumnsof thedata if there isamissingvalue,
resulting in the consequence that important features lose easily. Therefore, imputation
occupies a significant place in the preprocessing stage. This paper fills in themissing
valueswith thepandas.DataFrame.fillna function, usingpad (padding themissingvalue
with the previous non-missing value) and bfill (filling themissing valuewith the next
non-missingvalue)modes.
4.3.2. DataNormalization
Normalization is the standardized processing of all structured data to eliminate the di-
mensional impact betweenvarious indicators.Thepurposeof this procedure is tomake
the original data of the indicators in the sameorder ofmagnitudeunder comprehensive
comparativeevaluation. In thispaper, theMin-Maxnormalizationmethod isused to lin-
early transform the original data so that all the values aremapped between [0-1]. This
Y.Duetal. /Predicting the
InterbankCapitalAdequacyLevelBasedonFinancialDataAnalysis40
Intelligent Environments 2019
Workshop Proceedings of the 15th International Conference on Intelligent Environments
- Titel
- Intelligent Environments 2019
- Untertitel
- Workshop Proceedings of the 15th International Conference on Intelligent Environments
- Autoren
- Andrés Muñoz
- Sofia Ouhbi
- Wolfgang Minker
- Loubna Echabbi
- Miguel Navarro-Cía
- Verlag
- IOS Press BV
- Datum
- 2019
- Sprache
- deutsch
- Lizenz
- CC BY-NC 4.0
- ISBN
- 978-1-61499-983-6
- Abmessungen
- 16.0 x 24.0 cm
- Seiten
- 416
- Kategorie
- Tagungsbände