Page - 8 - in Document Image Processing
Image of the Page - 8 -
Text of the Page - 8 -
J. Imaging 2018,4, 68
and smoother correspondingbleed-through text. As ameasure for âquantity of inkâhaving such
properties,weuse theconceptofopticaldensity,which is relatedto the intensityas follows:
d(i, j)=D(s(i, j))=âlog (
s(i, j)
b )
, (5)
where s(i, j) is the image intensityatpixel (i, j), and b represent themost frequent (or theaverage)
intensityvalueof thebackgroundarea in the image.
Thus,basedonthephysically-motivatedassumptionsabove,weadopta linear,non-stationary
model in theopticaldensities, todescribe thesuperpositionbetweenbackground, foregroundand
bleed-throughin the twoobservedrectoandverso images:
dobsr (i, j)= dr(i, j)+qv(i, j)D(hv(i, j)âsv(i, j)),
dobsv (i, j)= dv(i, j)+qr(i, j)D(hr(i, j)âsr(i, j)), (6)
for eachpixel (i, j). In Equation (6), dobs is the observedoptical density, and d is the ideal optical
densityofthefree-of-interferencesimage,withthesubscriptsrandv indicatingtherectoandversoside,
respectively.D is theoperator that,whenappliedto the intensity, returns theopticaldensityaccording
toEquation (5), andâ indicatesconvolutionbetweenthe ideal intensity sandaunitvolumePoint
SpreadFunction (PSF),h,describing thesmearingof inkpenetrating thepaper.Atpresent,weassume
stationaryPSFs, empirically chosenasGaussian functions, althoughamore reliablemodel for the
phenomenonof the inkspreadingshouldconsidernon-stationaryoperators. Finally, thespace-variant
quantitiesqr andqvhave thephysicalmeaningofattenuation levelsof thedensity (or inkpenetration
percentage), fromoneside to theother.
Theproposedalgorithm locates the bleed-throughpixels in each side as thosewhoseoptical
density is lower thanthatof thecorrespondingpixels in theoppositeside, i.e., of the foregroundthat
hasgeneratedthebleed-through. Thus,onthebasisofEquation(6), ateachpixel,weï¬rst compute the
followingratios:
qr(i, j)= dobsv (i,j)
D(hr(i,j)âsobsr (i,j))+ ,
qv(i, j)= dobsr (i,j)
D(hv(i,j)âsobsv (i,j))+ . (7)
Since theequationsaboveare intendedto identifybleed-throughpixels, theyarederivedfrom
themodel inEquation (6) assuming that the idealopticaldensityd(i, j) is zeroon thesideathand.
Asa consequenceof this assumption, theopposite, idealdensity, shouldcorrespond to thatof the
foregroundtext, andthencoincidewith thedensityof theblurredobserved intensity sobs. Then, forall
pixels,wemaintainthesmallestbetweenthetwocomputedattenuationlevels,andset tozerotheother.
Thisallowsforcorrectlydiscriminatingthetwoinstancesof foregroundononesideandbleed-through
intheother, so that,allpixelswhereqr>0areclassiï¬edasbleed-throughintheversoside,whereas
thosewhereqv>0areclassiï¬edasbleed-throughin therectoside.
However, it isapparent that,with thecriterionabove,wecanobtainwrongpositiveattenuation
levels, ononeof the twosides, in correspondenceof somebackgroundpixels andsomeocclusion
pixels, i.e.,where the twoforegroundtextssuperimposeoneachother. Thishappensbecause, in the
casesbackgroundâbackgroundandforegroundâforeground, the twodensitiesarealmost thesame,
aroundzero in theï¬rst caseandaroundthemaximumdensity in theother,withsmalloscillations that
makeunpredictable thevalueof theratios.
Tocorrect thispossibleoverestimationof thebleed-throughpixels,weset tozero theattenuation
levelwhen thedensities dobsr and dobsv are both low (or high, respectively) and close to eachother.
Weexperimentallyveriï¬edthat thisprocedureworkswell inmostcases.Ontheotherhand,evenif
somepixels remainmisclassiï¬edasbleed-through, thesparse inpaintingalgorithmthatwepropose
here isable toproperlyreplace themwith theoriginal, correctvalues.Asdetailed in thenextsection,
8
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik