Page - 73 - in Document Image Processing
Image of the Page - 73 -
Text of the Page - 73 -
J. Imaging 2018,4, 37
therearesimilaritiesbetweenthe topalignments (least costalignments)ofdifferentpairsof sequences.
In [20], theauthorsexploredthesesimilaritiesby learningasmall setofglobalprincipalalignments
fromthegivendata,whichcapturesall thepossiblecorrelations in thedata. Theseglobalprincipal
alignments are then used to compute the DTW distance for the new test sequences. Since these
methods [19,20] avoid the computationof optimal alignments, these are computationally efficient
comparedtonaiveDTWdistance. ThefastapproximateDTWdistancecanbeusedforefficient indexing
inDQCclassifier.However, itgivessub-optimal results. Forbest results, itneedsqueryspecificglobal
principalalignments. In thispaper,weintroducequeryspecificDTWdistance,whichenables thedirect
designofglobalprincipalalignments fornovelqueries. Globalprincipalalignmentsarecomputed
forasetof frequentclassesandseamlesslyextendedfor therareandarbitraryqueries,asandwhen
required,withoutusing languagespecificknowledge. This isadistinctadvantageoveranOCRengine,
which is difficult to adapt to varied fonts andnoisy images andwould require language specific
knowledgetogeneratepossiblehypotheses foroutofvocabularywords.Moreover,anOCRenginecan
respondtoawordimagequeryonlybyfirstconverting it into text,which isagainpronetorecognition
errors. In [21,22], deep learning frameworksareused forword spotting. In [23], a attributebased
learningmodelPHOC ispresentedforwordspotting. In trainingphase,eachwordimageis tobegiven
with its transcription. Bothword imagefeaturevectorsandits transcriptionsareusedtocreate the
PHOCrepresentation.AnSVMis learnedforeachattribute in this representation.Ourapproachbears
similaritywiththePHOCrepresentationbasedwordspotting[23]. Inthissense,boththeapproachesare
designedforhandlingout-of-vocabularyqueries.Ourworktakesadvantageofgranulardescription
atngrams (cut-portion) level. This somewhat resembles thearrangementof charactersused in the
PHOCencoding.However, trainingefforts for PHOCaresubstantialwitha largenumberofclassifiers
(604classifiers)beingtrainedandrequirescompletedata for training,which ishugefor largedatasets.
Inourwork, the amountof trainingdata is restricted toonly frequent classes,which ismuch less
comparedtoPHOC. Further,PHOCrequireslabels intheformoftranscriptions,whereasinourworkthe
labelsneednotbe transcriptions. Inaddition, PHOC is languagedependent [24]andit isverydifficult
toapplyoverdifferent languages. Themethodproposed in thispaper is language independent; it can
beappliedtoanylanguage.
Thepaper isorganizedas follows. Thenextsectiondescribes theDirectqueryclassifier (DQC).
Fast approximationof (DTW) distance is discussed in Section 3. Thequery specific DTWdistance
ispresented inSection4. Experimental settingsandresults arediscussed inSection5, followedby
concludingremarks inSection6.
2.DirectQueryClassifier (DQC)
In[18],Ranjanetal. proposedDirectQueryClassifier (DQC),whichisaone-shot learningscheme
for dynamically synthesizing classifiers for novel queries. Themain idea is to compute an SVM
classifier for thequeryclassusing theclassifiersobtainedfromthe frequentclassesof thedatabase.
Thenumberofpossiblewords ina languagecouldbevery largeanditwouldbepracticallydifficult to
buildaclassifier foreachof thewords.However, all thesewordscomefromasmall setofn-grams.
Thewordscorrespondingto the frequentqueriesareexpectedtocontain then-gramsthatcover the
fullvocabulary. ExemplarSVMclassifiersarecomputedfor the frequentqueries (wordclasses)and
thenappropriatelyconcatenatedtocreatenovelclassifiers for therarequeries.However, thisprocess
has its challengesdueto
(i) Variationsduetonatureofscriptandwritingstyle,
(ii) Classifiers forsmallerngramscouldbenoisy.
Theauthorsaddress these limitationsbybuildingtheSVMclassifiers formost frequentqueries
anduseclassifier synthesisonly for rarequeries. This improves itsoverallperformance. Theyuse
Query Expansion (QE) for further improving the performance. An overviewof the direct query
classifier isgiven in the followingsections.
73
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik