Seite - 73 - in Document Image Processing

Bild der Seite - 73 -

Text der Seite - 73 -

J. Imaging 2018,4, 37 therearesimilaritiesbetweenthe topalignments (least costalignments)ofdifferentpairsof sequences. In [20], theauthorsexploredthesesimilaritiesby learningasmall setofglobalprincipalalignments fromthegivendata,whichcapturesall thepossiblecorrelations in thedata. Theseglobalprincipal alignments are then used to compute the DTW distance for the new test sequences. Since these methods [19,20] avoid the computationof optimal alignments, these are computationally efﬁcient comparedtonaiveDTWdistance. ThefastapproximateDTWdistancecanbeusedforefﬁcient indexing inDQCclassiﬁer.However, itgivessub-optimal results. Forbest results, itneedsqueryspeciﬁcglobal principalalignments. In thispaper,weintroducequeryspeciﬁcDTWdistance,whichenables thedirect designofglobalprincipalalignments fornovelqueries. Globalprincipalalignmentsarecomputed forasetof frequentclassesandseamlesslyextendedfor therareandarbitraryqueries,asandwhen required,withoutusing languagespeciﬁcknowledge. This isadistinctadvantageoveranOCRengine, which is difﬁcult to adapt to varied fonts andnoisy images andwould require language speciﬁc knowledgetogeneratepossiblehypotheses foroutofvocabularywords.Moreover,anOCRenginecan respondtoawordimagequeryonlybyﬁrstconverting it into text,which isagainpronetorecognition errors. In [21,22], deep learning frameworksareused forword spotting. In [23], a attributebased learningmodelPHOC ispresentedforwordspotting. In trainingphase,eachwordimageis tobegiven with its transcription. Bothword imagefeaturevectorsandits transcriptionsareusedtocreate the PHOCrepresentation.AnSVMis learnedforeachattribute in this representation.Ourapproachbears similaritywiththePHOCrepresentationbasedwordspotting[23]. Inthissense,boththeapproachesare designedforhandlingout-of-vocabularyqueries.Ourworktakesadvantageofgranulardescription atngrams (cut-portion) level. This somewhat resembles thearrangementof charactersused in the PHOCencoding.However, trainingefforts for PHOCaresubstantialwitha largenumberofclassiﬁers (604classiﬁers)beingtrainedandrequirescompletedata for training,which ishugefor largedatasets. Inourwork, the amountof trainingdata is restricted toonly frequent classes,which ismuch less comparedtoPHOC. Further,PHOCrequireslabels intheformoftranscriptions,whereasinourworkthe labelsneednotbe transcriptions. Inaddition, PHOC is languagedependent [24]andit isverydifﬁcult toapplyoverdifferent languages. Themethodproposed in thispaper is language independent; it can beappliedtoanylanguage. Thepaper isorganizedas follows. Thenextsectiondescribes theDirectqueryclassiﬁer (DQC). Fast approximationof (DTW) distance is discussed in Section 3. Thequery speciﬁc DTWdistance ispresented inSection4. Experimental settingsandresults arediscussed inSection5, followedby concludingremarks inSection6. 2.DirectQueryClassiﬁer (DQC) In[18],Ranjanetal. proposedDirectQueryClassiﬁer (DQC),whichisaone-shot learningscheme for dynamically synthesizing classiﬁers for novel queries. Themain idea is to compute an SVM classiﬁer for thequeryclassusing theclassiﬁersobtainedfromthe frequentclassesof thedatabase. Thenumberofpossiblewords ina languagecouldbevery largeanditwouldbepracticallydifﬁcult to buildaclassiﬁer foreachof thewords.However, all thesewordscomefromasmall setofn-grams. Thewordscorrespondingto the frequentqueriesareexpectedtocontain then-gramsthatcover the fullvocabulary. ExemplarSVMclassiﬁersarecomputedfor the frequentqueries (wordclasses)and thenappropriatelyconcatenatedtocreatenovelclassiﬁers for therarequeries.However, thisprocess has its challengesdueto (i) Variationsduetonatureofscriptandwritingstyle, (ii) Classiﬁers forsmallerngramscouldbenoisy. Theauthorsaddress these limitationsbybuildingtheSVMclassiﬁers formost frequentqueries anduseclassiﬁer synthesisonly for rarequeries. This improves itsoverallperformance. Theyuse Query Expansion (QE) for further improving the performance. An overviewof the direct query classiﬁer isgiven in the followingsections. 73

zurück zum Buch Document Image Processing"

Document Image Processing

Titel: Document Image Processing
Autoren: Ergina Kavallieratou; Laurence Likforman-Sulem
Herausgeber: MDPI
Ort: Basel
Datum: 2018
Sprache: deutsch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Abmessungen: 17.0 x 24.4 cm
Seiten: 216
Schlagwörter: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie: Informatik