Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 65 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 65 - in Document Image Processing

Bild der Seite - 65 -

Bild der Seite - 65 - in Document Image Processing

Text der Seite - 65 -

J. Imaging 2018,4, 6 of1024. Table1showstheclusteringaccuracyrateof the testedwordsusingthe three implemented featureswhenusingvaryingnumberofclusters fromoneto10. Table1.Clusteringaccuracyrate (percent)ofSimplifiedArabic fontvs. numberofclustersusing three features (codebooksize=1024, lexicon 356,000). Features NumberofCoefficients Top1 Top2 Top3 Top4 Top5 Top6 Top7 Top8 Top9 Top10 DCT 160 84.7 96.0 98.4 98.9 99.1 99.4 99.5 99.6 99.7 99.7 DCT_4B 160 78.5 91.9 96.2 97.8 98.7 99.2 99.4 99.6 99.7 99.7 DCT+DCT_4B 200 86.1 96.2 98.5 99.1 99.3 99.6 99.7 99.8 99.8 99.8 TheresultsofTable1showthat theDCT+DCT_4Bfeature isbetter thantheothertwo.Thishybrid featurebenefitedfromthe localandglobal featureof theDCT,so itachievedgoodresults, especially in thenoisydata. Figure4showstherelationbetweencodebooksizeandclusteringaccuracyrate. Figure4.ClusteringaccuracyrateofSimplifiedArabic fontvs. codebooksizenumberusingDCT+ DCT_4Bfeature fordifferent topclusters. As shown in Figure 4, the clustering accuracy rate increases when using larger number of top-nclusterswhich isa logical consequence.Whenusingasmallnumberof clusters, eachcluster contains largenumberofwordswhichraises thepossibilityoffindingthe testedwordwithinoneof theseclusters.Whenthenumberofclusters increase, thenumberofwords ineachclusterdecrease, whichreduces theclusteringaccuracyratebutat thesametimethewordswithineachclusterbecomes moresimilar,whichstartsagain toraise theclusteringaccuracyrateevenupto thehighest levelwhen eachclustercontainsonlyoneword. 5. LanguageRescoring Toenhance therecognitionaccuracy, the top-hypotheses fromtheholistic recognitionresultsare rescoredusinga languagemodel. Inoursystem,weuseda4-gramlanguagemodel thatwas trained fromaGiga-wordArabic trainingdatabase [20]. The topn-hypotheses foreachwordarecombinedin a lattice formatasshowninFigure5, thenweusedtheA*search technique tosearch for thebest score path in that latticeusingthe4-gramlanguagemodel toselect thebestmatchingsentenceaccordingto theArabic languageconstraints [21]. 65
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing