Seite - 137 - in Document Image Processing
Bild der Seite - 137 -
Text der Seite - 137 -
J. Imaging 2018,4, 15
3.5.DecodingwithDeepOpticalModels
Decoding for bothdeepnet systemswasperformedwithWeighted Finite State Transducers
(WFST). Our decoder is based on the CTC-specific implementation proposed by [43] for speech
recognition.A“token”WFSTwasdesignedtohandleallpossible label sequencesat the framelevel,
soas toallowfor theoccurrenceof theblank labelalongwith therepetitionofnon-blank labels. It can
mapasequenceof frame-levelCTC labels toa single character. Asearchgraph isbuiltwith three
WFSTs(T,LandG) compiled independentlyandcombinedas follows:
S=T◦min(det(L◦G)) (3)
T,LandGarethetoken, lexiconandgrammarWFSTsrespectively,whereas◦,detandmindenote
composition, determinationandminimization, respectively. Thedeterminationandminimization
operationsareneededtocompress thesearchspace,yieldingafasterdecoding.
3.6. EvaluationMetrics
Thequalityof theobtainedtranscriptionswasassessedusingtheeditdistance [44]withrespect
to thereference text, at thewordandat thecharacter level. TheWordErrorRate (WER) is thisedit
distanceat thewordlevelandcanbecalculatedas theminimumnumberofsubstitutions,deletions
and insertionsneeded to transform the transcription into the reference, dividedby thenumberof
wordsof thereference:
WER= s+d+ i
n ·100 (4)
where s is thenumberof substitutions,d thenumberofdeletions, i thenumberof insertionsandn
the totalnumberofwords in thereference.
Similarly, this editdistancecanbecalculatedat thecharacter level, giving theCharacterError
Rate (CER). In this framework, theCERvalue isespecially interesting, since transcriptionerrorsare
usuallycorrectedat thecharacter level. TheOOVWordAccuracyRate (OOVWAR)wasmeasuredas
theamountofrecognizedOOVwordsover thetotalamountofOOVwords. Thestatisticalsignificance
ofexperimental resultscanbeestimatedbymeansofconfidence intervals.Generally,whencomparing
twoexperimental results, it isalways true that if theconfidence intervalsdonotoverlap,wecansay
that thedifference is statisticallysignificant [45]. In thiswork, confidence intervalsofprobability95%
(α=0.025)werecalculatedbyusingthebootstrappingmethodwith10,000repetitions [46] for these
ratemeasures.
Finally, as language models are probability distributions over entire sentences or texts,
perplexity [47]canbeusedtoevaluate theirperformanceoverareference text. In thiswork,weuse
theperplexitypresentedbyacharacterLMover theOOVwords (assequencesofcharacters), toassess
thedifferencesbetweentherecognizedandunrecognizedOOVwords.
4. ExperimentalResults
In the test experiments, we compared the performance on the test partition of the Rodrigo
corpus. Different systemswere compared, the first one based onHMMs, the second one based
onRNNandthe thirdoneonCRNN.For the threesystems, experimentswereperformedatword,
sub-word,andcharacter levels.Wefirstexplore the influenceof thesizeof theLMcontext (n-gram
degree). Then,wedevelop an analysis of thedifference between the structure of recognized and
unrecognizedOOVwords. The lastexperimentcompares theresultsobtained in threedifferentcases:
openvocabulary, closedvocabularyandwhenusingthevalidationsamples for trainingtheLM.
Weobservedthat in the trainingpartitionofRodrigo,usually therearenospacesbetweenwords
and punctuationmarks, sowe decided to remove those spaces from the hypotheses offered by
theword-basedsystems. Therefore, in theword-basedcases, therecognizedOOVwordscorrespond
137
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik