Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 145 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 145 - in Document Image Processing

Image of the Page - 145 -

Image of the Page - 145 - in Document Image Processing

Text of the Page - 145 -

J. Imaging 2018,4, 15 thecharacter-basedapproach: aWERequal to14.0%±0.3, aCERequal to3.0%±0.1andanOOV WARequal to 69.2%±1.1. These results confirmthe interest ofworkingat the character level for transcribinghistoricalmanuscripts. 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 WER=14.0% CER=3% OOV WAR=69.2% n-gram size Word Error Rate Character Error Rate OOV Word Accuracy Rate Figure19.ResultsobtainedbytheCRNNcharacter-basedsystemusingn-gramlanguagemodelswith sizen={1,. . . ,15}. Table 5. Overall best results on the Rodrigo test set in terms ofWER, CER andOOVWAR for theCRNNsystem. Measure Word Sub-Word Character3-gram 4-gram 10-gram WER 17.9%±0.4 14.8%±0.3 14.0%±0.3 CER 4.0%±0.1 3.4%±0.1 3.0%±0.1 OOVWAR 21.5%±1.0 42.4%±1.5 69.2%±1.1 5.Conclusions In this paper, we dealwith the transcription of historical documents, forwhich no external linguistic resourcesareavailable.WehavedevelopedvariousHTRsystemsthatmodel languageat wordandsub-lexical levels.Wehaveshownthatcharacter-based languagemodelingperformsbest. Thestrengthsof theproposedworkare: • comparingseveral typesofHTRsystems(HMM-based,RNN-based). • proposingastate-of-the-artHTRsystemforthetranscriptionofancientSpanishdocumentswhose opticalpart isbasedonverydeepnets (CRNNs). • proposingtoassociate theopticalHTRsystemwithadictionaryanda languagemodelbasedon sub-lexicalunits. Theseunitsareshowntobeefficient inorder tocopewithOOVwords. • reachingwith such optical andLMHTR components the best overall recognition results on apubliclyavailableSpanishhistoricaldatasetofdocument images. In futurework,wewould like toextendthisworkusingotherkindsof languagemodels, suchas modelsbasedonRNN. Acknowledgments: Work partially supported by projects READ: Recognition and Enrichment of Archival Documents-674943(EuropeanUnion’sH2020)andCoMUN-HaT:Context,MultimodalityandUserCollaboration in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER), and a DGA-MRIS (Direction Généralede l’Armement -Missionpour laRechercheet l’InnovationScientifique)scholarship. Author Contributions: Emilio Granell and Edgard Chammas conceived and implemented the recognition systems(HMM,BLSTM,CRNN).Allauthorscontributed inequalproportion to thedesignof theresearchandto thefinalmanuscript. Conflictsof Interest:Theauthorsdeclarenoconflictof interest. 145
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing