Page - 72 - in Document Image Processing
Image of the Page - 72 -
Text of the Page - 72 -
J. Imaging 2018,4, 37
advantagethatitdoesnotrequirepriorlearningduetoitsappearance-basedmatching.Thesetechniques
havebeenpopularlyusedindocument imageretrieval.
Konidarisetal. [5]retrievewordsfromalargecollectionofprintedhistoricaldocuments.Asearch
keyword typed by the user is converted into a synthetic word imagewhich is used as a query
image. Wordmatching is based on computing the L1 distancemetric between the query feature
and all the features in the database. Here the features are calculated using the density of the
character pixels and the area that is formed from the projections of the upper and lower profile
of theword. Therankedresultsare further improvedbyrelevance feedback. SankarandJawahar [7]
havesuggestedaframeworkofprobabilistic reverseannotationforannotatinga largecollectionof
images.Wordimagesweresegmentedfrom500Telugubooks.Matchingof thewordimages isdone
using theDTWapproach [11]. Hierarchical agglomerative clusteringwasused to cluster theword
images. Exemplars for thekeywordsaregeneratedbyrenderingthewordto formakeyword-image.
Annotation involved identifying the closestword cluster to each keyword cluster. This involves
estimatingtheprobability thateachclusterbelongs to thekeyword.YalnizandManmatha[4]have
appliedwordspotting to scannedEnglishandTelugubooks. Theyareable tohandlenoise in the
document textbytheuseof SIFT featuresextractedonsalientcornerpoints. RathandManmatha[11]
usedprojectionprofileandwordprofile features inaDTWbasedmatchingtechnique.
Recognition free retrieval was attempted in the past for printed as well as handwritten
documentcollections [4,7,12,13]. Sincemostof thesemethodsweredesignedforsmallercollections
(fewhandwrittendocumentsas in [12]), computational timewasnotamajorconcern.Methods that
extended this to a larger collection [14–16] usedmostly (approximate) nearest neighbor retrieval.
For searching complex objects in large databases, SVMs have emerged as themost popular and
accurate solution in the recent past [12]. For linear SVMs, both training and testing have become
veryfastwith the introductionofefficientalgorithmsandexcellent implementations [17].However,
there are two fundamental challenges in using a classifier based solution for word retrieval
(i) A classifier needs a good amount of annotated training data (both positive and negative) for
training. Obtainingannotateddata foreveryword ineverystyle ispractically impossible. (ii)One
couldtrainasetofclassifiers foragivensetof frequentqueries.However, theyarenotapplicable for
rarequeries.
In [18], Ranjan et al. proposed a one-shot classifier learning scheme (Direct query classifier).
The proposed one shot learning scheme enables direct design of a classifier for novel queries,
without having any access to the annotated training data, i.e., classifiers are trained for a set of
frequentqueries,andseamlesslyextendedfor therareandarbitraryqueries,asandwhenrequired.
Theauthorshypothesize thatword images, even ifdegraded, canbematchedandretrievedeffectively
withaclassifierbasedsolution.Aproperly trainedclassifiercanyieldanaccurate rankedlistofwords
since the classifier looksat thewordasawhole, andusesa larger context (saymultiple examples)
formatching. Theresultsof thismethodaresignificantsince (i) Itdoesnotuseany languagespecific
post-processingfor improvingtheaccuracy. (ii)Evenfora language likeEnglish,whereOCRsare fairly
advancedandengineering solutionswereperfected, the classifierbasedsolution is asgood, if not
superior to thebestavailablecommercialOCRs .
In thedirectqueryclassifier (DQC) scheme[18], theauthorsusedDTWdistance for indexingthe
frequentmeanvectors. Since theDTWdistance is computationally slow, theauthorsdonotuseall
the frequentmeanvectors for indexing. For comparing twoword images, DTWdistance typically
takesonesecond[3]. This limits theefficiencyofDQC. Toovercomethis limitation, theauthorsused
Euclideandistance for indexing. Theauthorsuse the top10(closest in termsofEuclideandistance)
frequentmeanvectors for indexing. Since theDTWdistancebettercaptures thesimilaritiescompared
toEuclideandistance forwordimageretrieval, this restricts theperformanceofDQC.
Forspeed-up,DTWdistancehasbeenpreviouslyapproximated[19,20]usingdifferent techniques.
In [20], the authors proposed a fast approximate DTW distance, in which, the DTW distance is
approximated as a sumofmultipleweighted Euclidean distances. For a given set of sequences,
72
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik