Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 140 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 140 - in Document Image Processing

Bild der Seite - 140 -

Bild der Seite - 140 - in Document Image Processing

Text der Seite - 140 -

J. Imaging 2018,4, 15 100 101 102 103 Recognised OOV words Not recognised OOV words Figure10.Distributionof theperplexitypresentedbythe10-gramcharacterLanguageModel (LM)per recognizedandunrecognizedOOVwords(decomposedintocharactersequences)bytheHMMsystem. Table 3. Features of the perplexity per OOV word recognized and unrecognized distributions for theHMMcharacter-based 10-gramLM.Q1, Q2 andQ3 are respectively the 1th, 2nd and 3rd quartile, IQR the interquartile range,Min. andMax. theminimumandmaximumvalues andSD thestandarddeviation. Distribution Q1 Q2 Q3 IQR Min. Max. SD Recognized 6.64 9.22 12.57 5.94 3.26 46.05 5.37 Unrecognized 8.70 12.21 17.75 9.05 3.06 367.07 16.25 4.3. Studyof theEffect ofClosing theVocabularyandAdding theTranscriptionof theValidationSet for Training theLM After the adjustment of thedecodingparameterswith thevalidation set, the transcriptionof thetextlinescontainedinthispartitioncanbeusedtotrainanimprovedLMthat,hopefully,willreduce theamountofOOVwords.Moreover, theOOVwordscanbe includedin thevocabularyasunigrams (closedvocabularyexperiments) toverify their influenceontherecognition. Theseconditionswere experimentedfor thebest languagemodelsatwordandcharacter levels (3-gramfor thewordbased systemand10-gramfor thecharacter-basedsystem). Given that thesub-wordapproachpresented no significative difference in terms of WER, compared to the word-based system (see Table 2), thisapproachwasnot tested in thisexperiment. Figures 11–13 allow comparing the obtained results for the word-based system and the character-based approach with open and closed vocabulary, with and without the use of the validation sampleswhen training the LM (see Section 3.4). On the one hand, as can be seen in Figures 11 and13, theuse of the validation set does not significantly improve theword-based recognition in terms of WER or CER. However, this additional information is very useful in thecharacter-basedapproach.Ascanbeobserved inFigure11,astatistically-significant improvement in terms of CER is achieved (16.9%±0.3 instead of 17.6%±0.3). This improvement allows increasing the OOV word recognition accuracy (see Figure 12). On the other side, although closing thevocabularysignificantly improves the recognitionperformance, it is interesting tonote thebeneficial effect of theuseof thevalidation samples in the character-basedapproach. It is also interesting tonote inFigures 11and13 that the character-basedsystem, even in themoredifficult case (“open-vocabulary”), outperforms, in terms ofCER, theword-based system in the best case (“closed-vocabulary”). In theclosedvocabularyconditions, theword-basedsystemrecognizesmore OOVwords than the character-based system, 34.7%±1.2 instead of 29.6%±1.1 (see Figure 12). However, in the real-world case, i.e., the open-vocabulary conditions, the character-based system performsbetter. 140
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing