Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 145 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 145 - in Document Image Processing

Bild der Seite - 145 -

Bild der Seite - 145 - in Document Image Processing

Text der Seite - 145 -

J. Imaging 2018,4, 15 thecharacter-basedapproach: aWERequal to14.0%±0.3, aCERequal to3.0%±0.1andanOOV WARequal to 69.2%±1.1. These results confirmthe interest ofworkingat the character level for transcribinghistoricalmanuscripts. 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 WER=14.0% CER=3% OOV WAR=69.2% n-gram size Word Error Rate Character Error Rate OOV Word Accuracy Rate Figure19.ResultsobtainedbytheCRNNcharacter-basedsystemusingn-gramlanguagemodelswith sizen={1,. . . ,15}. Table 5. Overall best results on the Rodrigo test set in terms ofWER, CER andOOVWAR for theCRNNsystem. Measure Word Sub-Word Character3-gram 4-gram 10-gram WER 17.9%±0.4 14.8%±0.3 14.0%±0.3 CER 4.0%±0.1 3.4%±0.1 3.0%±0.1 OOVWAR 21.5%±1.0 42.4%±1.5 69.2%±1.1 5.Conclusions In this paper, we dealwith the transcription of historical documents, forwhich no external linguistic resourcesareavailable.WehavedevelopedvariousHTRsystemsthatmodel languageat wordandsub-lexical levels.Wehaveshownthatcharacter-based languagemodelingperformsbest. Thestrengthsof theproposedworkare: • comparingseveral typesofHTRsystems(HMM-based,RNN-based). • proposingastate-of-the-artHTRsystemforthetranscriptionofancientSpanishdocumentswhose opticalpart isbasedonverydeepnets (CRNNs). • proposingtoassociate theopticalHTRsystemwithadictionaryanda languagemodelbasedon sub-lexicalunits. Theseunitsareshowntobeefficient inorder tocopewithOOVwords. • reachingwith such optical andLMHTR components the best overall recognition results on apubliclyavailableSpanishhistoricaldatasetofdocument images. In futurework,wewould like toextendthisworkusingotherkindsof languagemodels, suchas modelsbasedonRNN. Acknowledgments: Work partially supported by projects READ: Recognition and Enrichment of Archival Documents-674943(EuropeanUnion’sH2020)andCoMUN-HaT:Context,MultimodalityandUserCollaboration in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER), and a DGA-MRIS (Direction Généralede l’Armement -Missionpour laRechercheet l’InnovationScientifique)scholarship. Author Contributions: Emilio Granell and Edgard Chammas conceived and implemented the recognition systems(HMM,BLSTM,CRNN).Allauthorscontributed inequalproportion to thedesignof theresearchandto thefinalmanuscript. Conflictsof Interest:Theauthorsdeclarenoconflictof interest. 145
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing