Page - 4 - in Document Image Processing
Image of the Page - 4 -
Text of the Page - 4 -
J. Imaging 2018,4, 68
manuscripts.Anexampleofbleed-throughremoval is showninFigure1. Earlier,physical restoration
methodswereapplied todealwithbleed-throughdegradation,butunfortunately thosemethodswere
costly, invasive,andsometimescausedpermanent, irreversibledamageto thedocuments.
In recent years, digital preservation of the documental heritage has been the focus of
intensivedigitisationandarchivingcampaigns,aimedat itsdistribution,accessibilityandanalysis.
Withdigitizationprevailing, inadditiontoconservation, thecomputingtechnologiesappliedto the
digital images of thesedocuments havequickly becomeapowerful andversatile tool to simplify
their studyandretrieval, andto facilitatenewinsights into thedocument’scontents.Digital image
processing techniquescanbeapplied to theseelectronicdocumentversions, toperformanyalteration
tothedocumentappearance,whilepreservingtheoriginal intact. Specifically,digital imageprocessing
techniqueshavebeenattemptedfor thevirtual restorationofdocumentsaffectedbybleed-through,
withsomeimpressiveresults. Inaddition, to improvethedocumentreadability, theremovalof the
bleed-thoughdegradation isalsoacriticalpreprocessingstep inmanytaskssuchas featureextraction,
optical character recognition, segmentation,andautomatic transcription.
Figure1.Anexampleofbleed-throughremoval.
Bleed-through removal is a challenging taskmainly due to the possible significant overlap
between the original text and the bleed-throughpattern, and thewidevariation of its extent and
intensity. In literature, bleed-through removal is addressedas a classificationproblem,where the
document image issubdividedinto threecomponents: background(thepapersupport), foreground
(themaintext), andbleed-through [1]. Broadly speaking, the existingmethods in thisdomain can
bedividedinto twomaincategories: blindorsingle-sided,andnon-blindordouble-sided. Inblind
methods, the imageofasingleside isused,whereas thenon-blindmethodsrequire the information
ofboththerectoandversosidesof thedocument.Mostof theearliermethodsrelyonthe intensity
informationof the imageandperformrestorationbasedonthegrayscaleorcolor (red,green,blue)
intensity distributions. The intensity basedmethods involve thresholding [3]; however, intensity
informationalone is insufficientas there isoftenasignificantoverlapbetween the foregroundand
bleed-throughintensityprofiles [4]. Inaddition, thresholdingmayalsodestroyotherusefuldocument
features, suchas stamps, annotations, orpaperwatermarks. Thus, intensitybased thresholding is
notsuitablewhentheaimis topreserve theoriginalappearanceof thedocument. Toovercomethese
drawbacks, somemethods incorporatespatial informationbyexploiting theneighbouringstructure.
Amongtheblindmethods, in [5], an independentcomponentanalysis (ICA)methodisproposed
toseparate the foreground,background,andbleed-throughlayers fromanRGBimage.Adual-layer
Markov randomfield (MRF) is suggested in [6],whereas, in [7], a conditional randomfield (CRF)
methodisproposed.Amultichannelbasedblindbleed-throughremoval issuggestedin[8]usingcolor
decorrelationorcolorspace transformations,whereas, in [9], a recursiveunsupervisedsegmentation
approach isappliedto thedataspacefirstdecorrelatedbyprincipalcomponentanalysis (PCA). In [10],
bleed-throughremoval isaddressedasablindsourceseparationproblem,solvedbyusingaMarkov
randomfield(MRF)based local smoothnessmodel. Similarly,anexpectedmaximization(EM)-based
approach issuggested in [11].
As per the non-blindmethods, amodel based approach using differences in the intensities
of recto andverso side is outlined in [12]. The samemodel is extended in [13] using variational
modelswithspatial smoothness in thewaveletdomain.Anon-blindICAmethodisoutlinedin[14].
Othermethodsof thiscategoryareproposedin[15–17]. Theperformanceof thenon-blindmethods
4
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik