Page - 150 - in Document Image Processing
Image of the Page - 150 -
Text of the Page - 150 -
Journal of
Imaging
Article
AStudyofDifferentClassifierCombination
ApproachesforHandwritten IndicScriptRecognition
AnirbanMukhopadhyay*,PawanKumarSingh,RamSarkar*andMitaNasipuri
DepartmentofComputerScienceandEngineering, JadavpurUniversity,Kolkata-700032,WestBengal, India;
pawansingh.ju@gmail.com(P.K.S.);mitanasipuri@gmail.com(M.N.)
* Correspondence: anirbanmcse@gmail.com(A.M.); raamsarkar@gmail.com(R.S.)
Received: 15December2017;Accepted: 8February2018;Published: 13February2018
Abstract: Script identification is anessential step indocument imageprocessingespeciallywhen
theenvironment ismulti-script/multilingual. Tilldate researchershavedevelopedseveralmethods
for thesaidproblem. For thiskindofcomplexpatternrecognitionproblem, it isalwaysdifficult to
decidewhichclassifierwouldbe thebest choice.Moreover, it is also true thatdifferent classifiers
offercomplementary informationabout thepatterns tobeclassified. Therefore, combiningclassifiers,
in an intelligent way, can be beneficial compared to using any single classifier. Keeping these
facts inmind, in this paper, information provided by one shape based and two texture based
features are combinedusing classifier combination techniques for script recognition (word-level)
purpose fromthehandwrittendocument images.CMATERdb8.4.1contains7200handwrittenword
samples belonging to 12 Indic scripts (600 per script) and the database ismade freely available
at https://code.google.com/p/cmaterdb/. Theword samples from thementioneddatabase are
classifiedbasedontheconfidencescoresprovidedbyMulti-LayerPerceptron(MLP)classifier.Major
classifier combination techniques includingmajorityvoting,Bordacount, sumrule,product rule,
maxrule,Dempster-Shafer (DS)ruleofcombinationandsecondaryclassifiersareevaluatedfor this
patternrecognitionproblem.Maximumaccuracyof98.45%isachievedwithan improvementof7%
over thebestperformingindividualclassifierbeingreportedonthevalidationset.
Keywords:Classifiercombination;Dempster-Shafer theoryofevidence; Indicscript identification;
HistogramsofOrientedGradients;ModifiedLog-Gaborfilter transform;Elliptical features
1. Introduction
In thedomainofdocument imagesprocessing,OpticalCharacterRecognition (OCR)systems
are, ingeneral,developedkeepingaparticularscript inmind,which implies thatsuchsystemscan
readcharacterswritten inaspecificscriptonly. This isbecause thenumberofcharacters, shapeof the
charactersorthewritingstyleofusingaparticularcharacterset issodifferentthatdesigningacommon
feature set applicable for recognizinganycharacter set ispractically impossible. Asanalternative,
apoolofOCRsystemsthatcorrespondtodifferentscripts [1]canbeusedtosolve thissaidproblem.
This statement infers that before thedocument images are fed to anOCRsystem, it is required to
identify thescript inwhichthedocument iswrittenso that thosedocument imagescanbesuitably
converted intoacomputer-editable formatusingthatOCRsystem.Thissummarizes theproblemof
script identification. Thereare some important applicationsof script identification systemsuchas
automaticarchivingaswellas indexingofmulti-scriptdocuments, searchingrequired information
fromdigitizedarchivesofmulti-scriptsdocument images.
In thispaper, script identificationfromhandwrittendocument imageswritten indifferentscripts
is considered. In this regard, it is tobenotedthathurdlesaremulti-foldwhenhandwrittendocument
imagesare consideredcompared to itsprintedcounterpart. Themaindifficultywhich researchers
J. Imaging 2018,4, 39 150 www.mdpi.com/journal/jimaging
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik