Page - 4 - in Document Image Processing

Image of the Page - 4 -

Text of the Page - 4 -

J. Imaging 2018,4, 68 manuscripts.Anexampleofbleed-throughremoval is showninFigure1. Earlier,physical restoration methodswereapplied todealwithbleed-throughdegradation,butunfortunately thosemethodswere costly, invasive,andsometimescausedpermanent, irreversibledamageto thedocuments. In recent years, digital preservation of the documental heritage has been the focus of intensivedigitisationandarchivingcampaigns,aimedat itsdistribution,accessibilityandanalysis. Withdigitizationprevailing, inadditiontoconservation, thecomputingtechnologiesappliedto the digital images of thesedocuments havequickly becomeapowerful andversatile tool to simplify their studyandretrieval, andto facilitatenewinsights into thedocument’scontents.Digital image processing techniquescanbeapplied to theseelectronicdocumentversions, toperformanyalteration tothedocumentappearance,whilepreservingtheoriginal intact. Speciﬁcally,digital imageprocessing techniqueshavebeenattemptedfor thevirtual restorationofdocumentsaffectedbybleed-through, withsomeimpressiveresults. Inaddition, to improvethedocumentreadability, theremovalof the bleed-thoughdegradation isalsoacriticalpreprocessingstep inmanytaskssuchas featureextraction, optical character recognition, segmentation,andautomatic transcription. Figure1.Anexampleofbleed-throughremoval. Bleed-through removal is a challenging taskmainly due to the possible signiﬁcant overlap between the original text and the bleed-throughpattern, and thewidevariation of its extent and intensity. In literature, bleed-through removal is addressedas a classiﬁcationproblem,where the document image issubdividedinto threecomponents: background(thepapersupport), foreground (themaintext), andbleed-through [1]. Broadly speaking, the existingmethods in thisdomain can bedividedinto twomaincategories: blindorsingle-sided,andnon-blindordouble-sided. Inblind methods, the imageofasingleside isused,whereas thenon-blindmethodsrequire the information ofboththerectoandversosidesof thedocument.Mostof theearliermethodsrelyonthe intensity informationof the imageandperformrestorationbasedonthegrayscaleorcolor (red,green,blue) intensity distributions. The intensity basedmethods involve thresholding [3]; however, intensity informationalone is insufﬁcientas there isoftenasigniﬁcantoverlapbetween the foregroundand bleed-throughintensityproﬁles [4]. Inaddition, thresholdingmayalsodestroyotherusefuldocument features, suchas stamps, annotations, orpaperwatermarks. Thus, intensitybased thresholding is notsuitablewhentheaimis topreserve theoriginalappearanceof thedocument. Toovercomethese drawbacks, somemethods incorporatespatial informationbyexploiting theneighbouringstructure. Amongtheblindmethods, in [5], an independentcomponentanalysis (ICA)methodisproposed toseparate the foreground,background,andbleed-throughlayers fromanRGBimage.Adual-layer Markov randomﬁeld (MRF) is suggested in [6],whereas, in [7], a conditional randomﬁeld (CRF) methodisproposed.Amultichannelbasedblindbleed-throughremoval issuggestedin[8]usingcolor decorrelationorcolorspace transformations,whereas, in [9], a recursiveunsupervisedsegmentation approach isappliedto thedataspaceﬁrstdecorrelatedbyprincipalcomponentanalysis (PCA). In [10], bleed-throughremoval isaddressedasablindsourceseparationproblem,solvedbyusingaMarkov randomﬁeld(MRF)based local smoothnessmodel. Similarly,anexpectedmaximization(EM)-based approach issuggested in [11]. As per the non-blindmethods, amodel based approach using differences in the intensities of recto andverso side is outlined in [12]. The samemodel is extended in [13] using variational modelswithspatial smoothness in thewaveletdomain.Anon-blindICAmethodisoutlinedin[14]. Othermethodsof thiscategoryareproposedin[15–17]. Theperformanceof thenon-blindmethods 4

back to the book Document Image Processing"

Document Image Processing

Title: Document Image Processing
Authors: Ergina Kavallieratou; Laurence Likforman-Sulem
Editor: MDPI
Location: Basel
Date: 2018
Language: German
License: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Size: 17.0 x 24.4 cm
Pages: 216
Keywords: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category: Informatik