Seite - 132 - in Document Image Processing
Bild der Seite - 132 -
Text der Seite - 132 -
J. Imaging 2018,4, 15
Table1. Descriptionof thepartitionsof theRodrigocorpususedin thiswork.
Partition Lines Words Sub-Words
CharactersTotal
/Diff./OOV(over.) Total/Diff./OOV(over.) Total/Diff./OOV(over.)
Training 9000 98,232/12,650/- 148,070/3045/- 493,126/105/-
Validation 1000 10,899/3016/850 14,907/1074/7 54,936/82/1
Test 5010 55,195/7453/4918(203) 73,660/1418/55(11) 272,132/91/14(1)
3.HandwrittenTextRecognitionSystems
Thissectionpresentsourproposal, the featureextraction, themodelsusedbythe implemented
HTRsystemsandtheevaluationmetricsused in theexperimentation.
3.1. Proposal
TheHTRproblemcanbeformulatedasfindingthemost likelywordsequence wˆgivenafeature
vectorsequencex=(x1,x2, . . . ,x|x|) that representsahandwritten text line image[21], that is:
wˆ=argmax
w∈W Pr(w | x)=argmax
w∈W Pr(x |w)Pr(w)
Pr(x) =argmax
w∈W Pr(x |w)Pr(w) (1)
whereW represents thesetofallpermissiblewordsequences,Pr(x) is theprobabilityofobservingx,
Pr(w) is theprobabilityof thewordsequencew=(w1,w2, . . . ,w|w|)andPr(x |w) is theprobability
ofobservingxbyassumingthatw is theunderlyingwordsequence forx. Pr(w) isapproximatedby
theLanguageModel (LM),whereasPr(x |w) ismodeledbytheopticalmodel,which trainscharacter
modelsandconcatenates themtobuildopticalwordorsub-wordmodels.
Writtenwordscanbedecomposedintosmall sub-wordunits suchascharacters,but theycanalso
bedecomposedinto largersub-wordunitssuchasgraphemicsyllables,hyphensormultigrams[15].
Wechoosehere to compare character andhyphenworddecompositions. Inboth cases,wordsare
represented as a sequence of sub-wordunits s = (s1,s2, . . . ,s|s|). Then, theHTRproblemcanbe
reformulatedasfindingthemost likelysub-wordsequence sˆgivenafeaturevectorsequencex that
representsahandwritten text image. Therefore,Equation(1)becomes:
sˆ=argmax
s∈S Pr(x | s)Pr(s) (2)
where Pr(s) is approximated by a sub-wordLM,whereas Pr(x | s) can bemodeled by the same
opticalmodel.
It should be noted that RNN-based systems directly provide in their outputs posterior
distributionsof character labels, at each timestep, i.e., otk for k= 1,. . . ,Land t= 1,. . . ,T,Tbeing
the lengthof theobservationsequencexandL thealphabetsize. Fromtheseposteriors, thedecoding
canbeconstrainedbya lexiconanda languagemodel, inorder tofind thebestoutput sequence sˆ.
ThiscanbedonethroughWeightedFiniteStateTransducers (WFST)decoding(seeSection3.5),which
can includeseveral typesof lexiconandlanguagemodels (atword,hyphenorcharacter levels).
Workingat thesub-wordlevel inHTRrelaxes therestrictions imposedbythe lexicon,allowing
fora fasterdecoding, andgiven that the languagemodeldescribes the relationbetweensub-word
units, someOOVwordscanbedecoded. Therefore,ourproposal is todecodethehandwritten text
line imagesat thesub-word leveland, then, fromtheobtaineddecodingoutput, reconstruct thewords
tobuild thefinalhypothesis.
Firstofall, the languagemodelof sub-wordunits is trainedusing the transcriptionof the text
lines of the training partition after a minimum preprocessing. This preprocessing consists of
adding a new symbol (<SPACE>) for the separation betweenwords and then splitting thewords
intosub-wordsequences. In thisway, the informationof theseparationbetweenwords ismaintained.
Asanexample, the followingtext line fromthetrainingset:
132
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik