Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 61 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 61 - in Document Image Processing

Bild der Seite - 61 -

Bild der Seite - 61 - in Document Image Processing

Text der Seite - 61 -

J. Imaging 2018,4, 6 searchformoreeffectivesolutions to tackle theproblemofclassification.Amoredirectandefficient methodology canbeprovidedusingholistic recognition [6]. Holistic approachhandles thewhole wordasaunifiedunit. Aglobal featurevector is calculated for the indivisible inputwordsample which is thenutilized to classify thewordagainst a stored lexiconofwords. Holistic recognition is inspired fromwhat isknownas thewordsuperiorityeffect,whichstates thatpeoplehavebetter recognitionof letterspresentedwithinwordsascomparedto isolated lettersandto letterspresented withinnon-words [7].Holisticparadigmsarenotonlyeffective,butalsohavetheability tomaintain certaineffectswhicharespecial to theclassunderoperationsuchascoarticulationeffects [8]. Severalpreviousresearcheffortshave investigatedtheholisticapproachforArabiccursivescript recognition for both printed and handwritten types. Erlandson et al. [9] reported aword-level recognitionsystemformachine-printedArabic. Theyusedanimage-morphologicalbasedvectorof featuressuchasdotsandhamzas, thedirectionofsegments, the junctionsandendpoints,directionof cavities,holes,descendersand intra-wordgaps. All these featuresarecomputed foraqueryword image in therecognitionphaseandarematchedagainstapre-computeddatabaseofvectors froman Arabicwords lexiconandthat systemachievedawordrecognitionrateof65%. Thisaccuracywas achievedwith the integrationofa lexiconpruningsubsystemthat isbasedonanother recognition methodthatwasdevelopedunder thesameproject fora trainingsetof8436word imagesscanned at300dpi. Al-Badretal. [10]developedanArabicholisticwordrecognitionsystembasedonasetof shape primitives that aredetectedwithmathematicalmorphologyoperations. That systemwas trained using a single fontwith three types of documents: ideal (noise-free), synthetically degraded and scanned. Theused feature extractionoperatorswerevery sensitive to the scanningnoise and the degradedlowresolutiondocuments. Thatsystemachievedarecognitionrateof99.4%fornoise-free documents. Forsyntheticallydegradeddocuments, thesystemaccuracydecreasedto95.6%andto73% forscanneddocuments.All theseevaluationswereperformedusinga limited lexiconthatcontained 4317words [10]. KhorsheedandClocksin [11]presentedatechniqueforrecognizingArabiccursivewords from scannedimagesof textbytransformingeachwordinacertain lexicon intoanormalizedpolar image, andthenapplieda two-dimensionalFourier transformtothatpolar image. Eachwordis represented by a template that includes a set of Fourier’s coefficients, and for recognition, the systemused a normalizedEuclideandistance thatmeasures thedistancebetween thewordunder test and those templates. Thatsystemachievedarecognitionrateof90%fora lexiconsizeof145wordsandused 1700wordsamples for training. Togetbetterperformance,Khorsheed [12]presentedanewsystembasedonHiddenMarkov Models (HMMs). In that system, eachwordwas representedbyasingleHMM.Thewordmodels were trainedusing thewordsampleFourier’s spectrum. Theexperimentswereconductedonfour fonts, andthereportedresultsare forSimplifiedArabicandArabicTraditional fontsonly. Thesystem achievedahigherrecognitionratecomparedto the template-basedrecognizer. Thehighestachieved results forbothfontsare: 90%as thefirst choiceand98%within the top-tenchoices. In a laterwork, Khorsheed [13] presented a cursiveArabic text recognition systembasedon HMM.This systemwasalsosegmentation-freewithaneasy-to-extract statistical featuresvectorof length60 elements, representing threedifferent typesof features. This systemwas trainedwith a datacorpuswhich includesArabic textofmore than600A4-sizesheets typewritten insixdifferent computer-generated fonts: Tahoma, SimplifiedArabic, Traditional Arabic, Andalus, Naskh and Thuluth. Thehighestachievedresultswere88.7%and92.4%forAndalus font inmono-modeland tri-model, respectively. Inanotherexperiment, that systemwastrainedwithamulti-fontdataset that wasselectedrandomlywithsamesamplesize fromall fontsandtestedwithadatasetconsistingof 200 lines fromeachfont,andachievedanaccuracyof95%usingthe tri-model. Inanother effort,Krayemetal. [14]presentedaword level recognition systemusingdiscrete hiddenMarkov classifier alongwith a block based discrete cosine transform. This systemwas 61
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing