Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 138 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 138 - in Document Image Processing

Image of the Page - 138 -

Image of the Page - 138 - in Document Image Processing

Text of the Page - 138 -

J. Imaging 2018,4, 15 towordsattachedtopunctuationmarks,whichwerecorrectlyrecognizedafter removingthespace betweenthem(seeFigureA2). 4.1. Studyof theContextSize Influence Figure7presents theresultsobtainedfor theword-basedHMMsystem(in termsofWERand CER)byusingn-gramLMwithdifferentcontextsizesn={1,. . . ,6}. Ascanbeobservedinthisfigure, thebest resultwasobtainedbyusinga three-gramLM;concretely,aWERequal to43.3%±0.5, aCER equal to21.1%±0.3andanOOVWARequal to2.3%±0.4. 0% 10% 20% 30% 40% 50% 60% 1 2 3 4 5 6 WER=43.3% CER=21.1% OOV WAR=2.3% n-gram size Word Error Rate Character Error Rate OOV Word Accuracy Rate Figure7.ResultsobtainedbytheHMMword-basedsystemusingn-gramlanguagemodelswithsize n={1,. . . ,6}. Then, theperformanceof theHMMsystemat thesub-wordlevelwas tested. Figure8presents theresultsobtainedusingsub-wordn-gramLMwithdifferentsizesn={1,. . . ,6} in termsofWER, CERand recognition accuracyof theOOVwords. Thebest resultwas obtainedwith a sub-word languagemodelof sizen=4 (aWERequal to43.2%±0.5andaCERequal to20.0%±0.3). Regarding therecognitionofOOVwords, thesub-wordapproachwasable torecognizecorrectly9.3%±0.7of theOOVwords. Figure 9presents the results obtained for theHMMsystemusing charactern-gramLMwith differentdegreesn={1,. . . ,15} in termsofWER,CERandrecognitionaccuracyof theOOVwords. Althoughsimilar resultsareobtainedforn≥6, theoverallbest resultwasobtainedwithacharacter languagemodel of degree n = 10 (aWERequal to 39.8%±0.5 and aCERequal to 17.6%±0.3). RegardingtherecognitionofOOVwords, thischaracter-basedapproachwasabletorecognizecorrectly 18.3%±0.9 of theOOVwordsusingno external resource or dictionary, but a character language modelonly. Table2presentsasummaryof theobtainedbest results for the test experiments for theHMM system.Ascanbeobserved, the improvementofferedbythesub-wordapproach isnotstatistically significant at the WER level compared to the results obtained from the word-based system. Nevertheless, thecharacter-basedapproachoffers9.3%ofstatistically-significantrelative improvement over the baseline in terms of WER and 17.0% of statistically-significant relative improvement over the baseline in terms of CER. Thus, using a dictionary andLMat theword level performs worse than using a single character-based n-gramLM,with n large enough. This demonstrates the interest inworking at the character level for transcribinghistoricalmanuscripts. We study in the following the structureof theOOVwords incomparisonwith the trainingwords (Section4.2). Wealso study theeffect of reducing theOOVrate, eitherbyusing thevalidation set orby closing thevocabulary (Section4.3). 138
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing