Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 65 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 65 - in Document Image Processing

Image of the Page - 65 -

Image of the Page - 65 - in Document Image Processing

Text of the Page - 65 -

J. Imaging 2018,4, 6 of1024. Table1showstheclusteringaccuracyrateof the testedwordsusingthe three implemented featureswhenusingvaryingnumberofclusters fromoneto10. Table1.Clusteringaccuracyrate (percent)ofSimplifiedArabic fontvs. numberofclustersusing three features (codebooksize=1024, lexicon 356,000). Features NumberofCoefficients Top1 Top2 Top3 Top4 Top5 Top6 Top7 Top8 Top9 Top10 DCT 160 84.7 96.0 98.4 98.9 99.1 99.4 99.5 99.6 99.7 99.7 DCT_4B 160 78.5 91.9 96.2 97.8 98.7 99.2 99.4 99.6 99.7 99.7 DCT+DCT_4B 200 86.1 96.2 98.5 99.1 99.3 99.6 99.7 99.8 99.8 99.8 TheresultsofTable1showthat theDCT+DCT_4Bfeature isbetter thantheothertwo.Thishybrid featurebenefitedfromthe localandglobal featureof theDCT,so itachievedgoodresults, especially in thenoisydata. Figure4showstherelationbetweencodebooksizeandclusteringaccuracyrate. Figure4.ClusteringaccuracyrateofSimplifiedArabic fontvs. codebooksizenumberusingDCT+ DCT_4Bfeature fordifferent topclusters. As shown in Figure 4, the clustering accuracy rate increases when using larger number of top-nclusterswhich isa logical consequence.Whenusingasmallnumberof clusters, eachcluster contains largenumberofwordswhichraises thepossibilityoffindingthe testedwordwithinoneof theseclusters.Whenthenumberofclusters increase, thenumberofwords ineachclusterdecrease, whichreduces theclusteringaccuracyratebutat thesametimethewordswithineachclusterbecomes moresimilar,whichstartsagain toraise theclusteringaccuracyrateevenupto thehighest levelwhen eachclustercontainsonlyoneword. 5. LanguageRescoring Toenhance therecognitionaccuracy, the top-hypotheses fromtheholistic recognitionresultsare rescoredusinga languagemodel. Inoursystem,weuseda4-gramlanguagemodel thatwas trained fromaGiga-wordArabic trainingdatabase [20]. The topn-hypotheses foreachwordarecombinedin a lattice formatasshowninFigure5, thenweusedtheA*search technique tosearch for thebest score path in that latticeusingthe4-gramlanguagemodel toselect thebestmatchingsentenceaccordingto theArabic languageconstraints [21]. 65
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing