Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 143 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 143 - in Document Image Processing

Image of the Page - 143 -

Image of the Page - 143 - in Document Image Processing

Text of the Page - 143 -

J. Imaging 2018,4, 15 Theresultsobtainedwith theRNNsystemusingcharactern-gramLMarepresented inFigure16. As in thecharacter-basedHMMexperiments, similar resultsareobtainedforn≥6,andtheoverall best resultwasobtainedwitha10-gramcharacter languagemodel: aWERequal to37.7%±0.5,aCER equal to14.3%±0.3andanOOVWARequal to37.8%±1.1. 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 WER=37.7% CER=14.3% OOV WAR=37.8% n-gram size Word Error Rate Character Error Rate OOV Word Accuracy Rate Figure16.ResultsobtainedbytheRNNcharacter-basedsystemusingn-gramlanguagemodels. Asummaryof theobtainedbest results for the testexperiments for theRNNsystemispresented inTable 4. As canbeobserved, generally, theRNNapproachperformsbetter than the traditional HMMapproach.Althoughtheuseof theword-basedRNNsystemobtainsastatistically-significant relative deterioration of 19.6% over the HMM system (43.9%± 0.5) in terms of WER, 18.9% statistically-significant relative improvement in terms of CER (21.2%±0.3) can be considered. Moreover, 16.3% of OOV words, which correspond to words followed by punctuation marks, arewell recognized. Table4. Summaryof thebest results in termsofWER,CERandOOVWARfor theRNNsystem. Measure Word Sub-Word Character2-gram 5-gram 10-gram WER 52.5%±0.8 38.6%±0.5 37.7%±0.5 CER 17.2%±0.3 17.3%±0.3 14.3%±0.3 OOVWAR 16.3%±0.9 27.4%±1.1 37.8%±1.1 Theuseofsub-wordunitsoffersbetterresults thanusingwords,allowingonetoobtainsignificant improvements intermsofWERandCERovertheHMMsystem. Inthiscase, theuseofafive-gramLM trainedwithhyphenatedwordsallowedobtainingstatistically-significant improvementsat theWER levelover theuseofa two-gramLMof fullwords.However,as for theHMMsystem, theoverallbest resultsareobtainedbyusingthecharacter-basedapproach: aWERequal to37.7%±0.5,aCERequal to14.3%±0.3andanOOVWARequal to37.8%±1.1. 4.4.2. Results forDeepModelsBasedonConvolutionalRecurrentNeuralNetworks Figure 17presents the recognition results obtained for theword-basedCRNNsystem. As in the previousword-based systems, the recognizedOOVwords correspond towords attached to punctuation marks, which were correctly recognized after removing the space between them (see the example presented in Figure A2). The best result, obtained by using a three-gramLM, presentsaWERequal to17.9%±0.4,aCERequal to4.0%±0.1andanOOVWARequal to21.5%±1.0. The results obtainedusingsub-wordn-gramLMare shown inFigure18. Thebest resultwas obtainedwithafour-gramlanguagemodel(aWERequal to14.8%±0.3andaCERequal to3.4%±0.1). 143
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing