Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 151 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 151 - in Document Image Processing

Image of the Page - 151 -

Image of the Page - 151 - in Document Image Processing

Text of the Page - 151 -

J. Imaging 2018,4, 39 need todealwith is thenon-uniformityof theshapeandsizeof thecharacterswrittenbydifferent writers.Alongwiththese,problemslikeskew,slantetc. arecommonlyseeninhandwrittendocuments. Eventhepaperandinkqualitiesmakethingsmuchdifficult.Apart fromthe intrinsiccomplexitiesof handwritings, similaritiesamongthecharactersbelongingtodifferentscriptaugment thechallenges of script recognition fromthehandwrittendocument images. It isworthmentioning that, usually, script recognition isperformedatpage, text-lineoratword level. But in thispaper, this isdoneat word-levelbecauseof tworeasons: (a) featureextractionatword-level is less timeconsumingthanat pageorat text-line leveland(b)sometimes, it is seenthatasingledocumentpageorasingle text line containsmultiplescripts. In thatcase,word-level script identification isappropriate. Script recognition articles for handwrittendocuments are relatively limited in comparison to its printed counterpart. Ubul et al. [2] comprehensively showed the state-of-the-art performance results for different identification, feature extraction and classificationmethodologies involved in theprocess. Recently, Singhetal. [1]providedasurveyconsideringvarious featureextractionand classificationtechniquesassociatedwith theofflinescript identificationof the Indicscripts. Spitz [3] proposed a method for distinguishing between Asian and European languages by analysing the connectedcomponents. Tanet al. [4]developedamethodbasedon textureanalysis for automatic script identification fromdocument imagesusingmultiple channel (Gabor) filters andGray level co-occurrencematrices(GLCM)forsevenlanguages:Chinese,English,Greek,Koreans,Malayalam,Persian andRussian.Hochbergetal. [5,6]describedanalgorithmforscriptandlanguage identificationfrom handwritten document images using statistical features based on connected component analysis. Woodetal. [7]demonstratedaprojectionprofilemethodtodetermineRoman,Russian,Arabic,Korean andChinesecharacters.Chaudhurietal. [8]discussedanOCRsystemtoreadtwoIndianlanguagesviz., BanglaandDevanagari (Hindi). Paletal. [9]proposedanalgorithmforword-wisescript identification fromdocument containingEnglish,Devanagari andTelugu text, based on conventional andwater reservoir features.Chaudhuryetal. [10]proposedamethodfor identificationof Indian languagesby combiningGaborfilterbasedtechniquesanddirectiondistancehistogramclassifier forHindi,English, Malayalam,Bengali,Telugu andUrdu. Someanalysis of thevariability involved in themulti-script signaturerecognitionproblemascomparedto thesingle-script scenario isdiscussed in [11,12]. Variousclassificationalgorithmsareappliedfordifferentpatternrecognitionproblemsandthe samefactalsoapplies to thescript recognitionproblem.Tilldate, for Indicscript recognitionpurpose, differentclassifiershavebeenusedsuchask-NearestNeighbours (k-NN)[13,14],LinearDiscriminant Analysis (LDA) [15],NeuralNetworks (NN) [15,16], SupportVectorMachine (SVM) [16,17], Tree based classifier [18,19], Simple Logistic [20] andMLP [21,22]. Though good results have already beenachieved in this pattern recognition taskbutwith a single classifier it is still hard to achieve acceptableaccuracy. Studiesexpose that the fusionofmultipleclassifierscanbeaviablesolutionto getbetterclassificationresultsas theerroramassedbyanysingleclassifier isgenerallycompensated using information fromother classifiers. The reason for this is that different classifiersmayoffer complementary informationabout thepatternsunderconsideration. Basedonthis fact, since long, a section of researchers has focused ondevising different algorithms for combining classifiers in an intelligentway so that the combination can achieve better results than any of the individual classifier used for combining. The key idea is that instead of relying on a single decisionmaker, all thedesignsor their subsets are applied for thedecisionmakingby combining their individual beliefs in order to come upwith a consensus decision. This factmotivatesmany researchers to apply the classifier combinationmethods to different pattern recognitionproblems. Thepopular methodologies for classifier combination include: MajorityVoting [23,24], Subset-combining and re-rankingapproach[25],Statisticalmodel [26],BayesianBelief Integration[27],Combinationbased onDStheoryofevidence [27,28]andNeuralNetworkcombinator [29]. But tilldate, classifiercombinationapproachforscript recognitionproblem,eitherhandwritten orprinted,hasnotbeentestedmuch, thoughithasenormouspotential. Tobridgethis researchgap, thispaperappliesdifferent classifiercombination techniques in thefieldof Indic script recognition. 151
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing