Seite - 8 - in Document Image Processing
Bild der Seite - 8 -
Text der Seite - 8 -
J. Imaging 2018,4, 68
and smoother correspondingbleed-through text. As ameasure for “quantity of ink”having such
properties,weuse theconceptofopticaldensity,which is relatedto the intensityas follows:
d(i, j)=D(s(i, j))=−log (
s(i, j)
b )
, (5)
where s(i, j) is the image intensityatpixel (i, j), and b represent themost frequent (or theaverage)
intensityvalueof thebackgroundarea in the image.
Thus,basedonthephysically-motivatedassumptionsabove,weadopta linear,non-stationary
model in theopticaldensities, todescribe thesuperpositionbetweenbackground, foregroundand
bleed-throughin the twoobservedrectoandverso images:
dobsr (i, j)= dr(i, j)+qv(i, j)D(hv(i, j)⊗sv(i, j)),
dobsv (i, j)= dv(i, j)+qr(i, j)D(hr(i, j)⊗sr(i, j)), (6)
for eachpixel (i, j). In Equation (6), dobs is the observedoptical density, and d is the ideal optical
densityofthefree-of-interferencesimage,withthesubscriptsrandv indicatingtherectoandversoside,
respectively.D is theoperator that,whenappliedto the intensity, returns theopticaldensityaccording
toEquation (5), and⊗ indicatesconvolutionbetweenthe ideal intensity sandaunitvolumePoint
SpreadFunction (PSF),h,describing thesmearingof inkpenetrating thepaper.Atpresent,weassume
stationaryPSFs, empirically chosenasGaussian functions, althoughamore reliablemodel for the
phenomenonof the inkspreadingshouldconsidernon-stationaryoperators. Finally, thespace-variant
quantitiesqr andqvhave thephysicalmeaningofattenuation levelsof thedensity (or inkpenetration
percentage), fromoneside to theother.
Theproposedalgorithm locates the bleed-throughpixels in each side as thosewhoseoptical
density is lower thanthatof thecorrespondingpixels in theoppositeside, i.e., of the foregroundthat
hasgeneratedthebleed-through. Thus,onthebasisofEquation(6), ateachpixel,wefirst compute the
followingratios:
qr(i, j)= dobsv (i,j)
D(hr(i,j)⊗sobsr (i,j))+ ,
qv(i, j)= dobsr (i,j)
D(hv(i,j)⊗sobsv (i,j))+ . (7)
Since theequationsaboveare intendedto identifybleed-throughpixels, theyarederivedfrom
themodel inEquation (6) assuming that the idealopticaldensityd(i, j) is zeroon thesideathand.
Asa consequenceof this assumption, theopposite, idealdensity, shouldcorrespond to thatof the
foregroundtext, andthencoincidewith thedensityof theblurredobserved intensity sobs. Then, forall
pixels,wemaintainthesmallestbetweenthetwocomputedattenuationlevels,andset tozerotheother.
Thisallowsforcorrectlydiscriminatingthetwoinstancesof foregroundononesideandbleed-through
intheother, so that,allpixelswhereqr>0areclassifiedasbleed-throughintheversoside,whereas
thosewhereqv>0areclassifiedasbleed-throughin therectoside.
However, it isapparent that,with thecriterionabove,wecanobtainwrongpositiveattenuation
levels, ononeof the twosides, in correspondenceof somebackgroundpixels andsomeocclusion
pixels, i.e.,where the twoforegroundtextssuperimposeoneachother. Thishappensbecause, in the
casesbackground–backgroundandforeground–foreground, the twodensitiesarealmost thesame,
aroundzero in thefirst caseandaroundthemaximumdensity in theother,withsmalloscillations that
makeunpredictable thevalueof theratios.
Tocorrect thispossibleoverestimationof thebleed-throughpixels,weset tozero theattenuation
levelwhen thedensities dobsr and dobsv are both low (or high, respectively) and close to eachother.
Weexperimentallyverifiedthat thisprocedureworkswell inmostcases.Ontheotherhand,evenif
somepixels remainmisclassifiedasbleed-through, thesparse inpaintingalgorithmthatwepropose
here isable toproperlyreplace themwith theoriginal, correctvalues.Asdetailed in thenextsection,
8
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik