Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 51 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 51 - in Document Image Processing

Image of the Page - 51 -

Image of the Page - 51 - in Document Image Processing

Text of the Page - 51 -

J. Imaging 2018,4, 57 threshold thideal tobeat aroundavalueof 100. Wehave takenvarious thresholdvalues from5 to 115andfoundexperimentally that theaccuracyofclassification ismaximumatabouta thresholdof 100. It is tobenoted thatwehave set thishardcore thresholdvalueafter conductingaexhaustive experimentationonthe imagesbelongingtoourdataset.Achange indocument imagesmightchange the thresholdvalueabit,but,weforetell that, thisassumptionwouldgive theresearchersaclearhint toset the thresholdvalue for thedocument images theyconsider. 3.Method Theinputcolorimageisfirstconvertedtothegrayscaleimageandthentheconnectedcomponents (CCs) are extracted for feature computation and classification. The entire process is depicted in Figure6. ForCCextraction,first thegrayscale image isbinarizedandtheboundingboxes (BBs)of all of the eight-connected components in the binarized image are calculated. Then, using these estimated bounding boxes, CCs from the corresponding grayscale image are extracted. As we are considering real-world handwritten documents, we need to be very careful about the noise present in thesedocuments,whichmightaffect thebinarizationandBBestimationprocess. Thus, for effectivebinarization, abackgroundestimationandseparationprocedure is followed,prior to the actualbinarization,usingOtsu’smethodasgiven in [27].DuringBBestimationfromthebinarized image,only theCCshavingheightandwidthgreater thanthreepixelsareconsideredtoavoidnoise. Afterextractionof theCCsfromthegrayscale image, sixdifferentLBPbasedfeaturesarecomputed. Duringfeaturecomputation, theradiusRhasbeenkeptconstantat1 (i.e., thenumberofneighboring pixelsM= 8). Inorder to computea featurevector for eachCC,wehavegeneratedanormalized histogramof those LBPvalues. The number of bins useddepends on the particular LBPvariant considered. Here,weshouldalsopointout that theLBPoperatorshavebeenapplied toeachand everypixelofaCC,withoutanydiscrimination. Figure6.Flowchartof theentire text/non-text separationprocess. 4. ExperimentalSetup Experimentalsetupforanypatternclassificationproblemrequiresanannotateddataset,classifiers and a set of evaluationmetrics. In this section, the data preparationprocedure is describedfirst, 51
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing