Seite - 151 - in Document Image Processing

Bild der Seite - 151 -

Text der Seite - 151 -

J. Imaging 2018,4, 39 need todealwith is thenon-uniformityof theshapeandsizeof thecharacterswrittenbydifferent writers.Alongwiththese,problemslikeskew,slantetc. arecommonlyseeninhandwrittendocuments. Eventhepaperandinkqualitiesmakethingsmuchdifﬁcult.Apart fromthe intrinsiccomplexitiesof handwritings, similaritiesamongthecharactersbelongingtodifferentscriptaugment thechallenges of script recognition fromthehandwrittendocument images. It isworthmentioning that, usually, script recognition isperformedatpage, text-lineoratword level. But in thispaper, this isdoneat word-levelbecauseof tworeasons: (a) featureextractionatword-level is less timeconsumingthanat pageorat text-line leveland(b)sometimes, it is seenthatasingledocumentpageorasingle text line containsmultiplescripts. In thatcase,word-level script identiﬁcation isappropriate. Script recognition articles for handwrittendocuments are relatively limited in comparison to its printed counterpart. Ubul et al. [2] comprehensively showed the state-of-the-art performance results for different identiﬁcation, feature extraction and classiﬁcationmethodologies involved in theprocess. Recently, Singhetal. [1]providedasurveyconsideringvarious featureextractionand classiﬁcationtechniquesassociatedwith theofﬂinescript identiﬁcationof the Indicscripts. Spitz [3] proposed a method for distinguishing between Asian and European languages by analysing the connectedcomponents. Tanet al. [4]developedamethodbasedon textureanalysis for automatic script identiﬁcation fromdocument imagesusingmultiple channel (Gabor) ﬁlters andGray level co-occurrencematrices(GLCM)forsevenlanguages:Chinese,English,Greek,Koreans,Malayalam,Persian andRussian.Hochbergetal. [5,6]describedanalgorithmforscriptandlanguage identiﬁcationfrom handwritten document images using statistical features based on connected component analysis. Woodetal. [7]demonstratedaprojectionproﬁlemethodtodetermineRoman,Russian,Arabic,Korean andChinesecharacters.Chaudhurietal. [8]discussedanOCRsystemtoreadtwoIndianlanguagesviz., BanglaandDevanagari (Hindi). Paletal. [9]proposedanalgorithmforword-wisescript identiﬁcation fromdocument containingEnglish,Devanagari andTelugu text, based on conventional andwater reservoir features.Chaudhuryetal. [10]proposedamethodfor identiﬁcationof Indian languagesby combiningGaborﬁlterbasedtechniquesanddirectiondistancehistogramclassiﬁer forHindi,English, Malayalam,Bengali,Telugu andUrdu. Someanalysis of thevariability involved in themulti-script signaturerecognitionproblemascomparedto thesingle-script scenario isdiscussed in [11,12]. Variousclassiﬁcationalgorithmsareappliedfordifferentpatternrecognitionproblemsandthe samefactalsoapplies to thescript recognitionproblem.Tilldate, for Indicscript recognitionpurpose, differentclassiﬁershavebeenusedsuchask-NearestNeighbours (k-NN)[13,14],LinearDiscriminant Analysis (LDA) [15],NeuralNetworks (NN) [15,16], SupportVectorMachine (SVM) [16,17], Tree based classiﬁer [18,19], Simple Logistic [20] andMLP [21,22]. Though good results have already beenachieved in this pattern recognition taskbutwith a single classiﬁer it is still hard to achieve acceptableaccuracy. Studiesexpose that the fusionofmultipleclassiﬁerscanbeaviablesolutionto getbetterclassiﬁcationresultsas theerroramassedbyanysingleclassiﬁer isgenerallycompensated using information fromother classiﬁers. The reason for this is that different classiﬁersmayoffer complementary informationabout thepatternsunderconsideration. Basedonthis fact, since long, a section of researchers has focused ondevising different algorithms for combining classiﬁers in an intelligentway so that the combination can achieve better results than any of the individual classiﬁer used for combining. The key idea is that instead of relying on a single decisionmaker, all thedesignsor their subsets are applied for thedecisionmakingby combining their individual beliefs in order to come upwith a consensus decision. This factmotivatesmany researchers to apply the classiﬁer combinationmethods to different pattern recognitionproblems. Thepopular methodologies for classiﬁer combination include: MajorityVoting [23,24], Subset-combining and re-rankingapproach[25],Statisticalmodel [26],BayesianBelief Integration[27],Combinationbased onDStheoryofevidence [27,28]andNeuralNetworkcombinator [29]. But tilldate, classiﬁercombinationapproachforscript recognitionproblem,eitherhandwritten orprinted,hasnotbeentestedmuch, thoughithasenormouspotential. Tobridgethis researchgap, thispaperappliesdifferent classiﬁercombination techniques in theﬁeldof Indic script recognition. 151

zurück zum Buch Document Image Processing"

Document Image Processing

Titel: Document Image Processing
Autoren: Ergina Kavallieratou; Laurence Likforman-Sulem
Herausgeber: MDPI
Ort: Basel
Datum: 2018
Sprache: deutsch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Abmessungen: 17.0 x 24.4 cm
Seiten: 216
Schlagwörter: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie: Informatik