Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 133 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 133 - in Document Image Processing

Bild der Seite - 133 -

Bild der Seite - 133 - in Document Image Processing

Text der Seite - 133 -

J. Imaging 2018,4, 15 Agora cuenta la historia wouldbetransformedinto the followingcharactersequence: A g o r a~<SPACE> c u e n t a~<SPACE> l a~<SPACE> h i s t o r i a or into the followingsequence followingthehyphenationrules forSpanish: Ago ra <SPACE> cuen ta <SPACE> la <SPACE> his to ria Then, thesepreprocessedtranscriptionscanbeusedto train thesub-wordunit languagemodel. Usually,n-gramlanguagemodelsof sub-wordunits are trainedwitha largen (large context). On theotherside, the lexicon is reducedtomatchthe listof sub-wordunits. In thedecodingprocess, thebesthypothesis isprocessedtoobtain thefinalhypothesis. Thisfinal processconsistsof collapsingthesub-wordunit sequence to formwordsandtosubstitute thesymbol usedtomarktheseparationbetweenwords(<SPACE>)byaspace. Figure4presentsa text lineexample fromthetestpartitionwhosereference transcription is: vio e recognoscio el Astragamiento que perdiera de su gente In this example, thewords recognoscio andAstragamiento areOOVwords. It is interesting to note theiretymology. Theyarearchaic formsfromEarlyModernSpanish(15th–17thcentury) that in ModernSpanishcorrespondtotheforms reconocióandEstragamiento. For that reason,wecouldnot findtheminanyexternal resource,noteven inGoogleN-Grams[22]. Figure4.Text linesample. “Recognoscio”and“Astragamiento”arerarewords;recognoscio isanarchaic formof reconocióandAstragamientoanancient formofEstragamiento. The HMM decoding process with a traditional word-based approach offers the following besthypothesis: vno & rea gustio el Astragar mando que perdona de lugar whichrepresentsaCharacterErrorRate (CER)equal to35.6%withrespect to thereference text-line transcription.However,usingasub-wordbasedapproach, the followingbesthypothesis isobtained: vio <SPACE> & <SPACE> re ca ges cio <SPACE> el <SPACE> As tra ga mien to <SPACE> que <SPACE> per do na <SPACE> de <SPACE> lu gar <SPACE> which is transformedinto the improvedhypothesis (CER=22.0%): vio & recagescio el Astragamiento que perdona de lugar Ontheotherhand,withacharacter-basedapproach, the followingbesthypothesis isobtained: v i o <SPACE> & <SPACE> r e c e g e s c i o <SPACE> e l <SPACE> A s t r a~g a~m i e n t o <SPACE> q u e <SPACE> p e r d i e r a~<SPACE> d e l <SPACE> s e g u n d o whichresults in thenextfinalbesthypothesis (CER=17.0%): vio & recegescio el Astragamiento que perdiera del segundo Ascanbeobserved, thefinalhypothesesobtainedatsub-wordlevels (characters,hyphenation sub-wordunits) inHTRareconsiderablybetter thanthoseobtainedwith theword-basedapproach. In addition, theOOVwordAstragamiento has been fully recognized. The secondOOVword is recognized as recegescio or recagescio, which also improves theword-based recognition rea gustio. InSection4,wordandsub-word languagemodelingapproacheswillbecomparedwithseveral types ofopticalHTRsystems. 133
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing