Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 150 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 150 - in Document Image Processing

Image of the Page - 150 -

Image of the Page - 150 - in Document Image Processing

Text of the Page - 150 -

Journal of Imaging Article AStudyofDifferentClassifierCombination ApproachesforHandwritten IndicScriptRecognition AnirbanMukhopadhyay*,PawanKumarSingh,RamSarkar*andMitaNasipuri DepartmentofComputerScienceandEngineering, JadavpurUniversity,Kolkata-700032,WestBengal, India; pawansingh.ju@gmail.com(P.K.S.);mitanasipuri@gmail.com(M.N.) * Correspondence: anirbanmcse@gmail.com(A.M.); raamsarkar@gmail.com(R.S.) Received: 15December2017;Accepted: 8February2018;Published: 13February2018 Abstract: Script identification is anessential step indocument imageprocessingespeciallywhen theenvironment ismulti-script/multilingual. Tilldate researchershavedevelopedseveralmethods for thesaidproblem. For thiskindofcomplexpatternrecognitionproblem, it isalwaysdifficult to decidewhichclassifierwouldbe thebest choice.Moreover, it is also true thatdifferent classifiers offercomplementary informationabout thepatterns tobeclassified. Therefore, combiningclassifiers, in an intelligent way, can be beneficial compared to using any single classifier. Keeping these facts inmind, in this paper, information provided by one shape based and two texture based features are combinedusing classifier combination techniques for script recognition (word-level) purpose fromthehandwrittendocument images.CMATERdb8.4.1contains7200handwrittenword samples belonging to 12 Indic scripts (600 per script) and the database ismade freely available at https://code.google.com/p/cmaterdb/. Theword samples from thementioneddatabase are classifiedbasedontheconfidencescoresprovidedbyMulti-LayerPerceptron(MLP)classifier.Major classifier combination techniques includingmajorityvoting,Bordacount, sumrule,product rule, maxrule,Dempster-Shafer (DS)ruleofcombinationandsecondaryclassifiersareevaluatedfor this patternrecognitionproblem.Maximumaccuracyof98.45%isachievedwithan improvementof7% over thebestperformingindividualclassifierbeingreportedonthevalidationset. Keywords:Classifiercombination;Dempster-Shafer theoryofevidence; Indicscript identification; HistogramsofOrientedGradients;ModifiedLog-Gaborfilter transform;Elliptical features 1. Introduction In thedomainofdocument imagesprocessing,OpticalCharacterRecognition (OCR)systems are, ingeneral,developedkeepingaparticularscript inmind,which implies thatsuchsystemscan readcharacterswritten inaspecificscriptonly. This isbecause thenumberofcharacters, shapeof the charactersorthewritingstyleofusingaparticularcharacterset issodifferentthatdesigningacommon feature set applicable for recognizinganycharacter set ispractically impossible. Asanalternative, apoolofOCRsystemsthatcorrespondtodifferentscripts [1]canbeusedtosolve thissaidproblem. This statement infers that before thedocument images are fed to anOCRsystem, it is required to identify thescript inwhichthedocument iswrittenso that thosedocument imagescanbesuitably converted intoacomputer-editable formatusingthatOCRsystem.Thissummarizes theproblemof script identification. Thereare some important applicationsof script identification systemsuchas automaticarchivingaswellas indexingofmulti-scriptdocuments, searchingrequired information fromdigitizedarchivesofmulti-scriptsdocument images. In thispaper, script identificationfromhandwrittendocument imageswritten indifferentscripts is considered. In this regard, it is tobenotedthathurdlesaremulti-foldwhenhandwrittendocument imagesare consideredcompared to itsprintedcounterpart. Themaindifficultywhich researchers J. Imaging 2018,4, 39 150 www.mdpi.com/journal/jimaging
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing