Page - 145 - in Document Image Processing
Image of the Page - 145 -
Text of the Page - 145 -
J. Imaging 2018,4, 15
thecharacter-basedapproach: aWERequal to14.0%±0.3, aCERequal to3.0%±0.1andanOOV
WARequal to 69.2%±1.1. These results confirmthe interest ofworkingat the character level for
transcribinghistoricalmanuscripts.
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
WER=14.0%
CER=3%
OOV WAR=69.2%
n-gram size Word Error Rate
Character Error Rate
OOV Word Accuracy Rate
Figure19.ResultsobtainedbytheCRNNcharacter-basedsystemusingn-gramlanguagemodelswith
sizen={1,. . . ,15}.
Table 5. Overall best results on the Rodrigo test set in terms ofWER, CER andOOVWAR for
theCRNNsystem.
Measure Word Sub-Word
Character3-gram
4-gram 10-gram
WER 17.9%±0.4 14.8%±0.3 14.0%±0.3
CER 4.0%±0.1 3.4%±0.1 3.0%±0.1
OOVWAR 21.5%±1.0 42.4%±1.5 69.2%±1.1
5.Conclusions
In this paper, we dealwith the transcription of historical documents, forwhich no external
linguistic resourcesareavailable.WehavedevelopedvariousHTRsystemsthatmodel languageat
wordandsub-lexical levels.Wehaveshownthatcharacter-based languagemodelingperformsbest.
Thestrengthsof theproposedworkare:
• comparingseveral typesofHTRsystems(HMM-based,RNN-based).
• proposingastate-of-the-artHTRsystemforthetranscriptionofancientSpanishdocumentswhose
opticalpart isbasedonverydeepnets (CRNNs).
• proposingtoassociate theopticalHTRsystemwithadictionaryanda languagemodelbasedon
sub-lexicalunits. Theseunitsareshowntobeefficient inorder tocopewithOOVwords.
• reachingwith such optical andLMHTR components the best overall recognition results on
apubliclyavailableSpanishhistoricaldatasetofdocument images.
In futurework,wewould like toextendthisworkusingotherkindsof languagemodels, suchas
modelsbasedonRNN.
Acknowledgments: Work partially supported by projects READ: Recognition and Enrichment of Archival
Documents-674943(EuropeanUnion’sH2020)andCoMUN-HaT:Context,MultimodalityandUserCollaboration
in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER), and a DGA-MRIS (Direction
Généralede l’Armement -Missionpour laRechercheet l’InnovationScientifique)scholarship.
Author Contributions: Emilio Granell and Edgard Chammas conceived and implemented the recognition
systems(HMM,BLSTM,CRNN).Allauthorscontributed inequalproportion to thedesignof theresearchandto
thefinalmanuscript.
Conflictsof Interest:Theauthorsdeclarenoconflictof interest.
145
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik