Page - 3 - in Document Image Processing
Image of the Page - 3 -
Text of the Page - 3 -
Journal of
Imaging
Article
Non-LocalSparseImageInpaintingforDocument
Bleed-ThroughRemoval
MuhammadHanif*,AnnaTonazzini,PasqualeSavinoandEmanueleSalerno
Instituteof InformationScienceandTechnologies, ItalianNationalResearchCouncil, 56124Pisa, Italy;
anna.tonazzini@isti.cnr.it (A.T.);pasquale.savino@isti.cnr.it (P.S.); emanuele.salerno@isti.cnr.it (E.S.)
* Correspondence:muhammad.hanif@isti.cnr.it
Received: 14 January2018;Accepted: 26April2018;Published: 9May2018
Abstract:Bleed-throughisa frequent,pervasivedegradation inancientmanuscripts,which iscaused
byinkseepedfromtheoppositesideof thesheet. Bleed-through,appearingasanextra interfering
text, hinders document readability andmakes it difficult to decipher the information contents.
Digital image restoration techniqueshavebeensuccessfully employed to removeor significantly
reduce thisdistortion. Thispaperproposesa two-steprestorationmethodfordocumentsaffected
bybleed-through,exploiting informationfromtherectoandverso images. First, thebleed-through
pixels are identified, based on anon-stationary, linearmodel of the two texts overlapped in the
recto-versopair. In thesecondstep,adictionary learning-basedsparse image inpaintingtechnique,
with non-local patch grouping, is used to reconstruct the bleed-through-contaminated image
information.Anovercompletesparsedictionaryislearnedfromthebleed-through-freeimagepatches,
which is thenusedtoestimateabefittingfill-in for the identifiedbleed-throughpixels. Thenon-local
patchsimilarity isemployedin thesparsereconstructionofeachpatch, toenforce the local similarity.
Thanks to the intrinsic image sparsity and non-local patch similarity, the natural texture of the
backgroundiswell reproduced in thebleed-throughareas, andevenapossibleoverestimationof
thebleedthroughpixels iseffectivelycorrected, so that theoriginalappearanceof thedocument is
preserved.Weevaluate theperformanceof theproposedmethodontheimagesofapopulardatabase
ofancientdocuments, andtheresultsvalidate theperformanceof theproposedmethodcomparedto
thestateof theart.
Keywords:ancientdocumentrestoration;imageinpainting;bleed-throughremoval;sparserepresentation
1. Introduction
Archival, ancient manuscripts constitute the primary carrier of most authentic information
startingfromthemedievalera, servingashistory’sowncloset, carryingstoriesofenigmatic,unknown
placesor incredibleevents that tookplace in thedistantpast,manyofwhicharestill toberevealed.
Thesemanuscriptsareofgreat interestandimportance forhistorians,andprovide insight intoculture,
civilisation, eventsandlifestylesofourpast.With thepassageof time, thesedocumentshavebeen
exposedtodifferenttypesofprogressivedegradations,suchasspotsorinkfading,duetofragilenature
of thewritingmedia,andbadstorageorenvironmentalconditions. Thisdegradationprocess limits the
useof theseancientclassics, andsomeof thedeteriorateddocumentshadaverynarrowescapefrom
totalannihilation. Specifically, in themanuscriptswrittenonbothsidesof thesheet,often the inkhad
seepedthroughandappearsasanunpleasantdegradationpatternonthereverseside. Inkpenetration
through thepaper ismainlydue toaging, humidity, inkchemicalpropertiesorpaperporosity [1],
andcanrangefromfainttosevere. Intheliterature, thiskindofdegradationistermedasbleed-through,
and impairs the legibility and interpretationof thedocument contents [2]. Therefore, it is of great
significance to remove the bleed-through contamination and restore the integrity of the original
J. Imaging 2018,4, 68 3 www.mdpi.com/journal/jimaging
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik