Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 152 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 152 - in Document Image Processing

Image of the Page - 152 -

Image of the Page - 152 - in Document Image Processing

Text of the Page - 152 -

J. Imaging 2018,4, 39 Themain contributionof thepresentwork is the comprehensive evaluationof themajor classifier combinationapproacheswhichareeitherrulebasedorapplyasecondaryclassifier for information fusion. Themotivation is to improvetheclassificationaccuracyat theword-levelhandwrittenscript recognitionbycombiningtheresultsof thebestperformingclassifieronthreepreviouslyusedfeature sets. It isamulti-classclassificationproblemandin thepresentcase,12officiallyused Indic-scriptsare consideredwhichare:Devanagari,Bangla,Odia,Gujarati,Gurumukhi,Tamil,Telugu,Kannada,Malayalam, Manipuri,UrduandRoman. Threedifferent setsof featurevectorsbasedonbothshapeandtexture analysishavebeenestimatedfromeachof thehandwrittenwordimages. Identificationof thescripts inwhich theword images arewritten, is donewith these featurevaluesby feeding the same into differentMLPclassifiers. Soft-decisionsprovidedbythe individualclassifiersare thencombinedusing anarrayof classifier combination techniques. This kindofwork is implemented for thefirst time assumingthenumberof Indicscriptsundertakenandtherangeofcombinationtechniquesapplied. Thesystemdevelopedfor thescript recognitiontaskhere, isapartof thegeneral frameworkwhere different featuresetsandclassifieroutputscanbemodelledintoasinglesystemwithoutmuchincrease in thecomputation involved. Blockdiagramof thepresentwork isshowninFigure1. Figure1.Schematicdiagramof theproposedmethodology. 2. FeatureExtraction In thispaper, threepopular featureextractionmethodologieshavebeenusedfor thecombination namely,EllipticalFeatures [21],HistogramofOrientedGradients (HOG)[30]andModifiedlog-Gabor filter transform[20]. Thefirst featureset isapplied tocapture theoverall structurepresent in thescript wordimageswhereas therest twofeaturesetsdealwith the textureof thesame. These featureshave alreadyprovidedsatisfactoryresults to thischallengingtaskofhandwrittenscript identification. 2.1. EllipticalFeatures Thewordimagesaregenerally foundtobeelongated innaturewhichcanbettercoveredbyan ellipse. That iswhy;elliptical featuresareextractedfromthecontourandthe local regionsofaword image so that it is easier to isolate aparticular script. Twomore important notationsused in this subsectionare: (a)Pixel ratio (Pr)and(b)Pixelcount (Pc). Pr isdefinedas theratioof thenumberof contourpixels (object) to thenumberofbackgroundpixelsandthepixel countwhereasPc isdefined as thenumberofcontourpixels. Thefeaturesaredescribed indetail: 2.1.1.MaximumInscribedEllipse Theheightandwidthof theboundingboxarecalculatedforeachwordimage.Arepresentative ellipse is theninscribed(consideringtheorientationof theellipse) inside thisboundingboxhaving 152
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing