Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 138 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 138 - in Document Image Processing

Bild der Seite - 138 -

Bild der Seite - 138 - in Document Image Processing

Text der Seite - 138 -

J. Imaging 2018,4, 15 towordsattachedtopunctuationmarks,whichwerecorrectlyrecognizedafter removingthespace betweenthem(seeFigureA2). 4.1. Studyof theContextSize Influence Figure7presents theresultsobtainedfor theword-basedHMMsystem(in termsofWERand CER)byusingn-gramLMwithdifferentcontextsizesn={1,. . . ,6}. Ascanbeobservedinthisfigure, thebest resultwasobtainedbyusinga three-gramLM;concretely,aWERequal to43.3%±0.5, aCER equal to21.1%±0.3andanOOVWARequal to2.3%±0.4. 0% 10% 20% 30% 40% 50% 60% 1 2 3 4 5 6 WER=43.3% CER=21.1% OOV WAR=2.3% n-gram size Word Error Rate Character Error Rate OOV Word Accuracy Rate Figure7.ResultsobtainedbytheHMMword-basedsystemusingn-gramlanguagemodelswithsize n={1,. . . ,6}. Then, theperformanceof theHMMsystemat thesub-wordlevelwas tested. Figure8presents theresultsobtainedusingsub-wordn-gramLMwithdifferentsizesn={1,. . . ,6} in termsofWER, CERand recognition accuracyof theOOVwords. Thebest resultwas obtainedwith a sub-word languagemodelof sizen=4 (aWERequal to43.2%±0.5andaCERequal to20.0%±0.3). Regarding therecognitionofOOVwords, thesub-wordapproachwasable torecognizecorrectly9.3%±0.7of theOOVwords. Figure 9presents the results obtained for theHMMsystemusing charactern-gramLMwith differentdegreesn={1,. . . ,15} in termsofWER,CERandrecognitionaccuracyof theOOVwords. Althoughsimilar resultsareobtainedforn≥6, theoverallbest resultwasobtainedwithacharacter languagemodel of degree n = 10 (aWERequal to 39.8%±0.5 and aCERequal to 17.6%±0.3). RegardingtherecognitionofOOVwords, thischaracter-basedapproachwasabletorecognizecorrectly 18.3%±0.9 of theOOVwordsusingno external resource or dictionary, but a character language modelonly. Table2presentsasummaryof theobtainedbest results for the test experiments for theHMM system.Ascanbeobserved, the improvementofferedbythesub-wordapproach isnotstatistically significant at the WER level compared to the results obtained from the word-based system. Nevertheless, thecharacter-basedapproachoffers9.3%ofstatistically-significantrelative improvement over the baseline in terms of WER and 17.0% of statistically-significant relative improvement over the baseline in terms of CER. Thus, using a dictionary andLMat theword level performs worse than using a single character-based n-gramLM,with n large enough. This demonstrates the interest inworking at the character level for transcribinghistoricalmanuscripts. We study in the following the structureof theOOVwords incomparisonwith the trainingwords (Section4.2). Wealso study theeffect of reducing theOOVrate, eitherbyusing thevalidation set orby closing thevocabulary (Section4.3). 138
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing