Seite - 134 - in Document Image Processing

Bild der Seite - 134 -

Text der Seite - 134 -

J. Imaging 2018,4, 15 3.2.HandcraftedFeatures Features are computed in several steps from text line images. First, the image brightness is normalized,andamedianﬁlterofsize3×3pixels isappliedtotheentire image.Next, slantcorrection is performed by using themaximumvariancemethodwith a threshold of 92% [23]. Then, size normalization isperformed,andtheﬁnal image is scaledtoaheightof40pixels. Finally,asequenceof 60-dimensional featurevectors isextractedbyaslidingwindow,usingthemethoddescribed in [24]. 3.3. LexiconandLanguageModels The lexicon and language models at the sub-word level were obtained by hyphenating thevocabularywords followingtherules formodernSpanishbyusingthetesthyphenspackage[25] for LATEX. Lexiconmodelswere inHTK lexicon format, where vocabularywords and sub-word units were modeled as a concatenation of symbols; however, characters were modeled as just thecorrespondingsymbol. LanguageModels (LM)wereestimatedasn-gramswithKneser–Neyback-off smoothing [26] byusing the SRILM toolkit [27]. Different LMswereused in the experiments atword, sub-word and character levels. For the word-based system and the open-vocabulary case, the LM is trained directly from the text-line transcriptions of the training set. In the closed-vocabulary case, the LM is trainedwith the same transcriptions, plus theOOVwords included as unigrams. For thecharacter-basedsystem, theclosed-vocabularycase indicates that thecharacter sequences that represent theOOVwordsareused forbuilding then-gramcharacterLM.Forboth systems,word orcharacter-based,“withvalidation”means that trainingandvalidationtranscriptionsareusedfor buildingtheLM. 3.4.OpticalModels Inthispaper, threedifferentapproachesforopticalmodelingforHTRareused: traditionalhidden Markovmodelsandtwodeepnetworkclassiﬁers. Theﬁrstone isbasedonrecurrentneuralnetworks withbi-directional long-short termmemory, andtheotherone isbasedonconvolutional recurrent neuralnetworks. 3.4.1.HiddenMarkovModels TheHiddenMarkovModels(HMM)foropticalmodelingweretrainedwithHTK[28]. Thetrained modelsare left-to-rightcharactermodels includingfourstates. Theobservationprobabilities ineach statearedescribedbyamixturedistributionof64Gaussians. Thenumberofcharactermodels is106, andwordsandsub-wordsaremodeledby the concatenationof compoundcharacterHMMs. The HMMsystemusesas inputsequencesofhandcraftedfeatures.HMMHTRsystemswere implemented byusingthe iATROSrecognizer [29]. 3.4.2.DeepModelsBasedonBLSTMs In this approach, weuse anRNN to estimate the posterior probabilities of the characters at the frame level (features vector). Therefore, the size of the input layer corresponds to the size of thehandcraftedfeaturevectorsandthesizeof theoutput layer to thenumberofdifferentcharacters. Theframe-level labelingrequiredto train thisneuralnetworkwasgeneratedfromaforcedalignment decodingbyapreviously trainedHMMrecognitionsystem[30]. This forcedalignmentdecodingand themodel trainingwererepeatedseveral timesuntil theconvergenceof theassignmentof the frame labels to theopticalmodel. Then,aspresented inFigure5,ourRNNis formedby60neuronesat the input layer,500BLSTM neurones at the hidden layerwith a hyperbolic tangent activation function and 106 neurones at the output layer with a softmax function. The training was performed by using RNNLIB [31], and themainparameters (suchas thesizeof thehidden layer)were tunedbyusing thevalidation 134

zurück zum Buch Document Image Processing"

Document Image Processing

Titel: Document Image Processing
Autoren: Ergina Kavallieratou; Laurence Likforman-Sulem
Herausgeber: MDPI
Ort: Basel
Datum: 2018
Sprache: deutsch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Abmessungen: 17.0 x 24.4 cm
Seiten: 216
Schlagwörter: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie: Informatik