Page - 145 - in Document Image Processing

Image of the Page - 145 -

Text of the Page - 145 -

J. Imaging 2018,4, 15 thecharacter-basedapproach: aWERequal to14.0%±0.3, aCERequal to3.0%±0.1andanOOV WARequal to 69.2%±1.1. These results conﬁrmthe interest ofworkingat the character level for transcribinghistoricalmanuscripts. 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 WER=14.0% CER=3% OOV WAR=69.2% n-gram size Word Error Rate Character Error Rate OOV Word Accuracy Rate Figure19.ResultsobtainedbytheCRNNcharacter-basedsystemusingn-gramlanguagemodelswith sizen={1,. . . ,15}. Table 5. Overall best results on the Rodrigo test set in terms ofWER, CER andOOVWAR for theCRNNsystem. Measure Word Sub-Word Character3-gram 4-gram 10-gram WER 17.9%±0.4 14.8%±0.3 14.0%±0.3 CER 4.0%±0.1 3.4%±0.1 3.0%±0.1 OOVWAR 21.5%±1.0 42.4%±1.5 69.2%±1.1 5.Conclusions In this paper, we dealwith the transcription of historical documents, forwhich no external linguistic resourcesareavailable.WehavedevelopedvariousHTRsystemsthatmodel languageat wordandsub-lexical levels.Wehaveshownthatcharacter-based languagemodelingperformsbest. Thestrengthsof theproposedworkare: • comparingseveral typesofHTRsystems(HMM-based,RNN-based). • proposingastate-of-the-artHTRsystemforthetranscriptionofancientSpanishdocumentswhose opticalpart isbasedonverydeepnets (CRNNs). • proposingtoassociate theopticalHTRsystemwithadictionaryanda languagemodelbasedon sub-lexicalunits. Theseunitsareshowntobeefﬁcient inorder tocopewithOOVwords. • reachingwith such optical andLMHTR components the best overall recognition results on apubliclyavailableSpanishhistoricaldatasetofdocument images. In futurework,wewould like toextendthisworkusingotherkindsof languagemodels, suchas modelsbasedonRNN. Acknowledgments: Work partially supported by projects READ: Recognition and Enrichment of Archival Documents-674943(EuropeanUnion’sH2020)andCoMUN-HaT:Context,MultimodalityandUserCollaboration in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER), and a DGA-MRIS (Direction Généralede l’Armement -Missionpour laRechercheet l’InnovationScientiﬁque)scholarship. Author Contributions: Emilio Granell and Edgard Chammas conceived and implemented the recognition systems(HMM,BLSTM,CRNN).Allauthorscontributed inequalproportion to thedesignof theresearchandto theﬁnalmanuscript. Conﬂictsof Interest:Theauthorsdeclarenoconﬂictof interest. 145

back to the book Document Image Processing"

Document Image Processing

Title: Document Image Processing
Authors: Ergina Kavallieratou; Laurence Likforman-Sulem
Editor: MDPI
Location: Basel
Date: 2018
Language: German
License: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Size: 17.0 x 24.4 cm
Pages: 216
Keywords: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category: Informatik