Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 4 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 4 - in Document Image Processing

Image of the Page - 4 -

Image of the Page - 4 - in Document Image Processing

Text of the Page - 4 -

J. Imaging 2018,4, 68 manuscripts.Anexampleofbleed-throughremoval is showninFigure1. Earlier,physical restoration methodswereapplied todealwithbleed-throughdegradation,butunfortunately thosemethodswere costly, invasive,andsometimescausedpermanent, irreversibledamageto thedocuments. In recent years, digital preservation of the documental heritage has been the focus of intensivedigitisationandarchivingcampaigns,aimedat itsdistribution,accessibilityandanalysis. Withdigitizationprevailing, inadditiontoconservation, thecomputingtechnologiesappliedto the digital images of thesedocuments havequickly becomeapowerful andversatile tool to simplify their studyandretrieval, andto facilitatenewinsights into thedocument’scontents.Digital image processing techniquescanbeapplied to theseelectronicdocumentversions, toperformanyalteration tothedocumentappearance,whilepreservingtheoriginal intact. Specifically,digital imageprocessing techniqueshavebeenattemptedfor thevirtual restorationofdocumentsaffectedbybleed-through, withsomeimpressiveresults. Inaddition, to improvethedocumentreadability, theremovalof the bleed-thoughdegradation isalsoacriticalpreprocessingstep inmanytaskssuchas featureextraction, optical character recognition, segmentation,andautomatic transcription. Figure1.Anexampleofbleed-throughremoval. Bleed-through removal is a challenging taskmainly due to the possible significant overlap between the original text and the bleed-throughpattern, and thewidevariation of its extent and intensity. In literature, bleed-through removal is addressedas a classificationproblem,where the document image issubdividedinto threecomponents: background(thepapersupport), foreground (themaintext), andbleed-through [1]. Broadly speaking, the existingmethods in thisdomain can bedividedinto twomaincategories: blindorsingle-sided,andnon-blindordouble-sided. Inblind methods, the imageofasingleside isused,whereas thenon-blindmethodsrequire the information ofboththerectoandversosidesof thedocument.Mostof theearliermethodsrelyonthe intensity informationof the imageandperformrestorationbasedonthegrayscaleorcolor (red,green,blue) intensity distributions. The intensity basedmethods involve thresholding [3]; however, intensity informationalone is insufficientas there isoftenasignificantoverlapbetween the foregroundand bleed-throughintensityprofiles [4]. Inaddition, thresholdingmayalsodestroyotherusefuldocument features, suchas stamps, annotations, orpaperwatermarks. Thus, intensitybased thresholding is notsuitablewhentheaimis topreserve theoriginalappearanceof thedocument. Toovercomethese drawbacks, somemethods incorporatespatial informationbyexploiting theneighbouringstructure. Amongtheblindmethods, in [5], an independentcomponentanalysis (ICA)methodisproposed toseparate the foreground,background,andbleed-throughlayers fromanRGBimage.Adual-layer Markov randomfield (MRF) is suggested in [6],whereas, in [7], a conditional randomfield (CRF) methodisproposed.Amultichannelbasedblindbleed-throughremoval issuggestedin[8]usingcolor decorrelationorcolorspace transformations,whereas, in [9], a recursiveunsupervisedsegmentation approach isappliedto thedataspacefirstdecorrelatedbyprincipalcomponentanalysis (PCA). In [10], bleed-throughremoval isaddressedasablindsourceseparationproblem,solvedbyusingaMarkov randomfield(MRF)based local smoothnessmodel. Similarly,anexpectedmaximization(EM)-based approach issuggested in [11]. As per the non-blindmethods, amodel based approach using differences in the intensities of recto andverso side is outlined in [12]. The samemodel is extended in [13] using variational modelswithspatial smoothness in thewaveletdomain.Anon-blindICAmethodisoutlinedin[14]. Othermethodsof thiscategoryareproposedin[15–17]. Theperformanceof thenon-blindmethods 4
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing