Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 134 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 134 - in Document Image Processing

Image of the Page - 134 -

Image of the Page - 134 - in Document Image Processing

Text of the Page - 134 -

J. Imaging 2018,4, 15 3.2.HandcraftedFeatures Features are computed in several steps from text line images. First, the image brightness is normalized,andamedianfilterofsize3×3pixels isappliedtotheentire image.Next, slantcorrection is performed by using themaximumvariancemethodwith a threshold of 92% [23]. Then, size normalization isperformed,andthefinal image is scaledtoaheightof40pixels. Finally,asequenceof 60-dimensional featurevectors isextractedbyaslidingwindow,usingthemethoddescribed in [24]. 3.3. LexiconandLanguageModels The lexicon and language models at the sub-word level were obtained by hyphenating thevocabularywords followingtherules formodernSpanishbyusingthetesthyphenspackage[25] for LATEX. Lexiconmodelswere inHTK lexicon format, where vocabularywords and sub-word units were modeled as a concatenation of symbols; however, characters were modeled as just thecorrespondingsymbol. LanguageModels (LM)wereestimatedasn-gramswithKneser–Neyback-off smoothing [26] byusing the SRILM toolkit [27]. Different LMswereused in the experiments atword, sub-word and character levels. For the word-based system and the open-vocabulary case, the LM is trained directly from the text-line transcriptions of the training set. In the closed-vocabulary case, the LM is trainedwith the same transcriptions, plus theOOVwords included as unigrams. For thecharacter-basedsystem, theclosed-vocabularycase indicates that thecharacter sequences that represent theOOVwordsareused forbuilding then-gramcharacterLM.Forboth systems,word orcharacter-based,“withvalidation”means that trainingandvalidationtranscriptionsareusedfor buildingtheLM. 3.4.OpticalModels Inthispaper, threedifferentapproachesforopticalmodelingforHTRareused: traditionalhidden Markovmodelsandtwodeepnetworkclassifiers. Thefirstone isbasedonrecurrentneuralnetworks withbi-directional long-short termmemory, andtheotherone isbasedonconvolutional recurrent neuralnetworks. 3.4.1.HiddenMarkovModels TheHiddenMarkovModels(HMM)foropticalmodelingweretrainedwithHTK[28]. Thetrained modelsare left-to-rightcharactermodels includingfourstates. Theobservationprobabilities ineach statearedescribedbyamixturedistributionof64Gaussians. Thenumberofcharactermodels is106, andwordsandsub-wordsaremodeledby the concatenationof compoundcharacterHMMs. The HMMsystemusesas inputsequencesofhandcraftedfeatures.HMMHTRsystemswere implemented byusingthe iATROSrecognizer [29]. 3.4.2.DeepModelsBasedonBLSTMs In this approach, weuse anRNN to estimate the posterior probabilities of the characters at the frame level (features vector). Therefore, the size of the input layer corresponds to the size of thehandcraftedfeaturevectorsandthesizeof theoutput layer to thenumberofdifferentcharacters. Theframe-level labelingrequiredto train thisneuralnetworkwasgeneratedfromaforcedalignment decodingbyapreviously trainedHMMrecognitionsystem[30]. This forcedalignmentdecodingand themodel trainingwererepeatedseveral timesuntil theconvergenceof theassignmentof the frame labels to theopticalmodel. Then,aspresented inFigure5,ourRNNis formedby60neuronesat the input layer,500BLSTM neurones at the hidden layerwith a hyperbolic tangent activation function and 106 neurones at the output layer with a softmax function. The training was performed by using RNNLIB [31], and themainparameters (suchas thesizeof thehidden layer)were tunedbyusing thevalidation 134
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing