Seite - 42 - in Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments
Bild der Seite - 42 -
Text der Seite - 42 -
Table6. KeywordsExamples.
Name Words
keywords âCapitalâ, âLiquidityâ, âCashFlowâ, âFundsâ, âCashâ, âMonetaryPolicyâ
pos words âSufïŹentâ, âPuttingCurrencyâ, âQuantitativeEasingâ, âEasyMonetaryPolicyâ
neg words âTightMonetaryPolicyâ, âTightâ, âTightFiscalPolicyâ, âTightenâ, âTightmoneyâ
nonwords âNotTightâ, âNotLooseâ, âTemporaryâ, âUnintentionalâ, âNoâ, âNeutralâ
4.4.2. StructuredDataFeatureExtraction
Feature selection, alsoknownasvariable selection, attribute selectionorvariable subset
selection, is the process of selecting a subset of relevant features (variables, predictors)
foruse inmodel construction. In thispaper, the feature is selectedbyembeddedmethod
(Embedded). Firstly, various of machine learningmodels are trained to obtain weight
coefïŹcients of each features.And then the features are selected according to the coefïŹ-
cient from large to small. Then the features are selected by using the basemodelwith
thepenalty term,which is implementedbycombining theSelectFromModelclassof the
Sklearn.feature selection librarywith the logistic regressionmodelandL1penalty term.
4.5. DataDimensionReduction
After feature extraction, themodel can be trained directly, but it may be necessary to
reduce the featurematrixdimensionbecause the featurematrix is too large,which leads
to theproblemofcomplicatedcalculationand long training time.
Wecompared threemethods of the dimensionality reduction, namelyLDA(Linear
DiscriminantAnalysis),PCA(PrincipalComponentAnalysis)andL1penaltyterm.LDA
isasupervised learningmethod,whichconsiders theclassiïŹcation label informationand
seeks the directionwith best classiïŹcation performance. In this paper, since the sample
size is small and the feature dimension is large, resulting in the inability to obtain the
optimal projection direction. PCA is anunsupervised learningmethod,whichperforms
a linearmapping of the data to a lower-dimensional space, while does not utilize any
internal classiïŹcation informationwhenmapping,making classiïŹcationmore difïŹcult.
Thedimensionality reductionmethodadopted in thispaper is themodelbasedon theL1
penalty termmentioned above. The principle of L1 penalty term reduction is to retain
one of a plurality of features that have same relevance to the target value so that the
dimensionality is reduced.
5. ModelSelectionandAlgorithmAnalysis
In this paper, ïŹvemethods are used to train and predict news text (unstructured data)
and structured data corpus. They are SVM(support vectormachine),GBDT (Gradien-
t BoostingDecisionTree),XGBoost (eXtremeGradientBoosting), LSTM(long short-
termmemory) andPerceptron. Thewhole data has a total of 973 days,whichwere di-
vided into a training set of 773 days and a test set of 200 days. The backtracking time
window is the timeperiodwhich is used for thedata trainingand testing,while thepre-
dictiontimewindowis thepredictiontimeperiodafter timestamp.Thelargerbacktrack-
ing timewindow,whichused for training, thewider timeperiod for selectingdata.The
Y.Duetal. /Predicting the
InterbankCapitalAdequacyLevelBasedonFinancialDataAnalysis42
Intelligent Environments 2019
Workshop Proceedings of the 15th International Conference on Intelligent Environments
- Titel
- Intelligent Environments 2019
- Untertitel
- Workshop Proceedings of the 15th International Conference on Intelligent Environments
- Autoren
- Andrés Muñoz
- Sofia Ouhbi
- Wolfgang Minker
- Loubna Echabbi
- Miguel Navarro-CĂa
- Verlag
- IOS Press BV
- Datum
- 2019
- Sprache
- deutsch
- Lizenz
- CC BY-NC 4.0
- ISBN
- 978-1-61499-983-6
- Abmessungen
- 16.0 x 24.0 cm
- Seiten
- 416
- Kategorie
- TagungsbÀnde