Seite - 41 - in Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments
Bild der Seite - 41 -
Text der Seite - 41 -
paper implements the normalization method by Sklearn.preprocessing.MinMaxScaler
class:
xnew= x−xmin
xmax−xmin (1)
where xmin is theminimumvalueof the sampledata, and xmax is themaximumvalueof
the sampledata.
4.4. FeatureandKeywordsExtraction
4.4.1. UnstructuredDataFeatureExtraction
Theselectionof featureentriesand theirweights iscalled the featureextractionof target
samples, and the advantages anddisadvantages of feature extractionwill directly affect
theoperationeffectof themodel.Except thewordvectors,weextract twootherkindsof
features:word frequencyandkeywords.
Here the TF-IDF (TermFrequency-InverseDocument Frequency) [9] algorithm is
used forword frequency analysis to evaluate the importance of a term in the news text
corpus. The importance of aword increases proportionallywith the number of times it
appears in a text, but at the same timedecreases inverselywith the frequency it appears
in the corpus.The top30words are selectedas the input of recognitionmodels.Table5
lists the top4words.
Table5. TF-IDFWordFrequencyAnalysis.
No. TOPWords Weights
1 Year-on-year 0.075403657
2 Increase 0.064093650
3 Trillion 0.055898826
4 Interest rate 0.050473410
Sentiment analysis canclassify thepolarityof thenews text anddeterminewhether
the expressed opinion is positive, negative or neutral. In this paper, we define four
keywords groups, which are key words, pos words, neg words and nonwords. The
keywords are the keywords extracted in a half-automatic way: first the keywords are
selectedbymatching thekeynouns in thenews textwithfinancial dictionary; then they
arecheckedandfilteredbystudents fromfinancialmajor; thepos wordsandnegwords
are polar verbs and adjectives, with thewords of positively or negatively affecting the
capitaladequacylevel.Thesewordsareextractedmanuallywithprofessionalknowledge
in thefinancialfieldanda largeamountof readingonSinanews text; nonwordsare the
privativewords, suchasnoandnone.Someof thekeywordsexamplesasshowninTable
6. If a news text contains keywords and positivewords, it was thought to be a positive
news, andwas labeled as 1. Likewise, negative andneutral newswere labeled -1 and0
respectively.Newswouldbe thought tobeneutral if it containsnoneof thesewords.
Y.Duetal. /Predicting the InterbankCapitalAdequacyLevelBasedonFinancialDataAnalysis 41
Intelligent Environments 2019
Workshop Proceedings of the 15th International Conference on Intelligent Environments
- Titel
- Intelligent Environments 2019
- Untertitel
- Workshop Proceedings of the 15th International Conference on Intelligent Environments
- Autoren
- Andrés Muñoz
- Sofia Ouhbi
- Wolfgang Minker
- Loubna Echabbi
- Miguel Navarro-Cía
- Verlag
- IOS Press BV
- Datum
- 2019
- Sprache
- deutsch
- Lizenz
- CC BY-NC 4.0
- ISBN
- 978-1-61499-983-6
- Abmessungen
- 16.0 x 24.0 cm
- Seiten
- 416
- Kategorie
- Tagungsbände