Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 134 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 134 - in Document Image Processing

Bild der Seite - 134 -

Bild der Seite - 134 - in Document Image Processing

Text der Seite - 134 -

J. Imaging 2018,4, 15 3.2.HandcraftedFeatures Features are computed in several steps from text line images. First, the image brightness is normalized,andamedianfilterofsize3×3pixels isappliedtotheentire image.Next, slantcorrection is performed by using themaximumvariancemethodwith a threshold of 92% [23]. Then, size normalization isperformed,andthefinal image is scaledtoaheightof40pixels. Finally,asequenceof 60-dimensional featurevectors isextractedbyaslidingwindow,usingthemethoddescribed in [24]. 3.3. LexiconandLanguageModels The lexicon and language models at the sub-word level were obtained by hyphenating thevocabularywords followingtherules formodernSpanishbyusingthetesthyphenspackage[25] for LATEX. Lexiconmodelswere inHTK lexicon format, where vocabularywords and sub-word units were modeled as a concatenation of symbols; however, characters were modeled as just thecorrespondingsymbol. LanguageModels (LM)wereestimatedasn-gramswithKneser–Neyback-off smoothing [26] byusing the SRILM toolkit [27]. Different LMswereused in the experiments atword, sub-word and character levels. For the word-based system and the open-vocabulary case, the LM is trained directly from the text-line transcriptions of the training set. In the closed-vocabulary case, the LM is trainedwith the same transcriptions, plus theOOVwords included as unigrams. For thecharacter-basedsystem, theclosed-vocabularycase indicates that thecharacter sequences that represent theOOVwordsareused forbuilding then-gramcharacterLM.Forboth systems,word orcharacter-based,“withvalidation”means that trainingandvalidationtranscriptionsareusedfor buildingtheLM. 3.4.OpticalModels Inthispaper, threedifferentapproachesforopticalmodelingforHTRareused: traditionalhidden Markovmodelsandtwodeepnetworkclassifiers. Thefirstone isbasedonrecurrentneuralnetworks withbi-directional long-short termmemory, andtheotherone isbasedonconvolutional recurrent neuralnetworks. 3.4.1.HiddenMarkovModels TheHiddenMarkovModels(HMM)foropticalmodelingweretrainedwithHTK[28]. Thetrained modelsare left-to-rightcharactermodels includingfourstates. Theobservationprobabilities ineach statearedescribedbyamixturedistributionof64Gaussians. Thenumberofcharactermodels is106, andwordsandsub-wordsaremodeledby the concatenationof compoundcharacterHMMs. The HMMsystemusesas inputsequencesofhandcraftedfeatures.HMMHTRsystemswere implemented byusingthe iATROSrecognizer [29]. 3.4.2.DeepModelsBasedonBLSTMs In this approach, weuse anRNN to estimate the posterior probabilities of the characters at the frame level (features vector). Therefore, the size of the input layer corresponds to the size of thehandcraftedfeaturevectorsandthesizeof theoutput layer to thenumberofdifferentcharacters. Theframe-level labelingrequiredto train thisneuralnetworkwasgeneratedfromaforcedalignment decodingbyapreviously trainedHMMrecognitionsystem[30]. This forcedalignmentdecodingand themodel trainingwererepeatedseveral timesuntil theconvergenceof theassignmentof the frame labels to theopticalmodel. Then,aspresented inFigure5,ourRNNis formedby60neuronesat the input layer,500BLSTM neurones at the hidden layerwith a hyperbolic tangent activation function and 106 neurones at the output layer with a softmax function. The training was performed by using RNNLIB [31], and themainparameters (suchas thesizeof thehidden layer)were tunedbyusing thevalidation 134
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing