Seite - 42 - in Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments

Bild der Seite - 42 -

Text der Seite - 42 -

Table6. KeywordsExamples. Name Words keywords ‘Capital’, ‘Liquidity’, ‘CashFlow’, ‘Funds’, ‘Cash’, ‘MonetaryPolicy’ pos words ‘Sufﬁent’, ‘PuttingCurrency’, ‘QuantitativeEasing’, ‘EasyMonetaryPolicy’ neg words ‘TightMonetaryPolicy’, ‘Tight’, ‘TightFiscalPolicy’, ‘Tighten’, ‘Tightmoney’ nonwords ‘NotTight’, ‘NotLoose’, ‘Temporary’, ‘Unintentional’, ‘No’, ‘Neutral’ 4.4.2. StructuredDataFeatureExtraction Feature selection, alsoknownasvariable selection, attribute selectionorvariable subset selection, is the process of selecting a subset of relevant features (variables, predictors) foruse inmodel construction. In thispaper, the feature is selectedbyembeddedmethod (Embedded). Firstly, various of machine learningmodels are trained to obtain weight coefﬁcients of each features.And then the features are selected according to the coefﬁ- cient from large to small. Then the features are selected by using the basemodelwith thepenalty term,which is implementedbycombining theSelectFromModelclassof the Sklearn.feature selection librarywith the logistic regressionmodelandL1penalty term. 4.5. DataDimensionReduction After feature extraction, themodel can be trained directly, but it may be necessary to reduce the featurematrixdimensionbecause the featurematrix is too large,which leads to theproblemofcomplicatedcalculationand long training time. Wecompared threemethods of the dimensionality reduction, namelyLDA(Linear DiscriminantAnalysis),PCA(PrincipalComponentAnalysis)andL1penaltyterm.LDA isasupervised learningmethod,whichconsiders theclassiﬁcation label informationand seeks the directionwith best classiﬁcation performance. In this paper, since the sample size is small and the feature dimension is large, resulting in the inability to obtain the optimal projection direction. PCA is anunsupervised learningmethod,whichperforms a linearmapping of the data to a lower-dimensional space, while does not utilize any internal classiﬁcation informationwhenmapping,making classiﬁcationmore difﬁcult. Thedimensionality reductionmethodadopted in thispaper is themodelbasedon theL1 penalty termmentioned above. The principle of L1 penalty term reduction is to retain one of a plurality of features that have same relevance to the target value so that the dimensionality is reduced. 5. ModelSelectionandAlgorithmAnalysis In this paper, ﬁvemethods are used to train and predict news text (unstructured data) and structured data corpus. They are SVM(support vectormachine),GBDT (Gradien- t BoostingDecisionTree),XGBoost (eXtremeGradientBoosting), LSTM(long short- termmemory) andPerceptron. Thewhole data has a total of 973 days,whichwere di- vided into a training set of 773 days and a test set of 200 days. The backtracking time window is the timeperiodwhich is used for thedata trainingand testing,while thepre- dictiontimewindowis thepredictiontimeperiodafter timestamp.Thelargerbacktrack- ing timewindow,whichused for training, thewider timeperiod for selectingdata.The Y.Duetal. /Predicting the InterbankCapitalAdequacyLevelBasedonFinancialDataAnalysis42

zurück zum Buch Intelligent Environments 2019 - Workshop Proceedings of the 15th International Conference on Intelligent Environments"

Intelligent Environments 2019 Workshop Proceedings of the 15th International Conference on Intelligent Environments

Titel: Intelligent Environments 2019
Untertitel: Workshop Proceedings of the 15th International Conference on Intelligent Environments
Autoren: Andrés Muñoz; Sofia Ouhbi; Wolfgang Minker; Loubna Echabbi; Miguel Navarro-Cía
Verlag: IOS Press BV
Datum: 2019
Sprache: deutsch
Lizenz: CC BY-NC 4.0
ISBN: 978-1-61499-983-6
Abmessungen: 16.0 x 24.0 cm
Seiten: 416
Kategorie: Tagungsbände