Page - 139 - in Document Image Processing
Image of the Page - 139 -
Text of the Page - 139 -
J. Imaging 2018,4, 15
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 2 3 4 5 6
WER=43.2%
CER=20.0%
OOV WAR=9.3%
n-gram size Word Error Rate
Character Error Rate
OOV Word Accuracy Rate
Figure8.Resultsobtainedbydecodingat theHMMsub-word levelbyusingn-gramlanguagemodels
withsizen={1,. . . ,6}.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
WER=39.8%
CER=17.6%
OOV WAR=18.3%
n-gram size Word Error Rate
Character Error Rate
OOV Word Accuracy Rate
Figure9.Resultsobtainedbydecodingat theHMMcharacter levelbyusingn-gramlanguagemodels
withsizen={1,. . . ,15}.
Table 2. Overall best results on the Rodrigo test set in terms ofWER, CER andOOVWAR for
theHMMsystem.
Measure Word Sub-Word
Character3-gram
4-gram 10-gram
WER 43.9%±0.5 43.2%±0.5 39.8%±0.5
CER 21.2%±0.3 20.0%±0.3 17.6%±0.3
OOVWAR 2.3%±0.3 9.3%±0.7 18.3%±0.9
4.2. Studyof theRelationbetween theStructureof theOOVWordsandtheTrainingWords
Thecharacter-basedapproachisabletorecognizesomeOOVwordsgiventhatthecharacter-based
LMlearns thestructureof thewordscontainedinthe trainingset. Inorder toverify thishypothesis,
wemeasuredtheperplexitypresentedbythebestcharacter-basedLM(10-gram)fordecodingeachone
of the4918OOVwordsas theircorrespondingcharactersequences. Figure10presents theobtained
perplexityperOOVwordseparated into twodistributions, recognizedandunrecognizedOOVwords.
Table3summarizes themainfeaturesof thesedistributions.Asexpected, therecognizedOOVwords
present lowerperplexity thantheunrecognizedOOVwords. Theoverlapofbothdistributionsmakes
us thinkthat there is still roomfor improvementgiventhatmoreOOVwordscouldberecognized.
139
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik