Seite - 134 - in Document Image Processing
Bild der Seite - 134 -
Text der Seite - 134 -
J. Imaging 2018,4, 15
3.2.HandcraftedFeatures
Features are computed in several steps from text line images. First, the image brightness is
normalized,andamedianfilterofsize3×3pixels isappliedtotheentire image.Next, slantcorrection
is performed by using themaximumvariancemethodwith a threshold of 92% [23]. Then, size
normalization isperformed,andthefinal image is scaledtoaheightof40pixels. Finally,asequenceof
60-dimensional featurevectors isextractedbyaslidingwindow,usingthemethoddescribed in [24].
3.3. LexiconandLanguageModels
The lexicon and language models at the sub-word level were obtained by hyphenating
thevocabularywords followingtherules formodernSpanishbyusingthetesthyphenspackage[25]
for LATEX. Lexiconmodelswere inHTK lexicon format, where vocabularywords and sub-word
units were modeled as a concatenation of symbols; however, characters were modeled as just
thecorrespondingsymbol.
LanguageModels (LM)wereestimatedasn-gramswithKneser–Neyback-off smoothing [26]
byusing the SRILM toolkit [27]. Different LMswereused in the experiments atword, sub-word
and character levels. For the word-based system and the open-vocabulary case, the LM is
trained directly from the text-line transcriptions of the training set. In the closed-vocabulary
case, the LM is trainedwith the same transcriptions, plus theOOVwords included as unigrams.
For thecharacter-basedsystem, theclosed-vocabularycase indicates that thecharacter sequences that
represent theOOVwordsareused forbuilding then-gramcharacterLM.Forboth systems,word
orcharacter-based,“withvalidation”means that trainingandvalidationtranscriptionsareusedfor
buildingtheLM.
3.4.OpticalModels
Inthispaper, threedifferentapproachesforopticalmodelingforHTRareused: traditionalhidden
Markovmodelsandtwodeepnetworkclassifiers. Thefirstone isbasedonrecurrentneuralnetworks
withbi-directional long-short termmemory, andtheotherone isbasedonconvolutional recurrent
neuralnetworks.
3.4.1.HiddenMarkovModels
TheHiddenMarkovModels(HMM)foropticalmodelingweretrainedwithHTK[28]. Thetrained
modelsare left-to-rightcharactermodels includingfourstates. Theobservationprobabilities ineach
statearedescribedbyamixturedistributionof64Gaussians. Thenumberofcharactermodels is106,
andwordsandsub-wordsaremodeledby the concatenationof compoundcharacterHMMs. The
HMMsystemusesas inputsequencesofhandcraftedfeatures.HMMHTRsystemswere implemented
byusingthe iATROSrecognizer [29].
3.4.2.DeepModelsBasedonBLSTMs
In this approach, weuse anRNN to estimate the posterior probabilities of the characters at
the frame level (features vector). Therefore, the size of the input layer corresponds to the size of
thehandcraftedfeaturevectorsandthesizeof theoutput layer to thenumberofdifferentcharacters.
Theframe-level labelingrequiredto train thisneuralnetworkwasgeneratedfromaforcedalignment
decodingbyapreviously trainedHMMrecognitionsystem[30]. This forcedalignmentdecodingand
themodel trainingwererepeatedseveral timesuntil theconvergenceof theassignmentof the frame
labels to theopticalmodel.
Then,aspresented inFigure5,ourRNNis formedby60neuronesat the input layer,500BLSTM
neurones at the hidden layerwith a hyperbolic tangent activation function and 106 neurones at
the output layer with a softmax function. The training was performed by using RNNLIB [31],
and themainparameters (suchas thesizeof thehidden layer)were tunedbyusing thevalidation
134
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik