Seite - 17 - in Document Image Processing
Bild der Seite - 17 -
Text der Seite - 17 -
Journal of
Imaging
Article
ANewBinarizationAlgorithmfor
HistoricalDocuments
MarcosAlmeida1,*,RafaelDueireLins2,3,RodrigoBernardino4,DarlissonJesus4
andBrunoLima1
1 DepartamentodeEletrônicaeSistemas,CentrodeTecnologia,UniversidadeFederaldePernambuco,
Recife-PE50670-901,Brazil;brunocesar182@hotmail.com
2 CentrodeInformática,UniversidadeFederaldePernambuco,Recife-PE50740-560,Brazil;
rdl.ufpe@gmail.com
3 DepartamentodeEstatísticae Informática,UniversidadeFederalRuraldePernambuco,
Recife-PE52171-900,Brazil
4 ProgramadePós-GraduaçãoemEngenhariaElétrica,UniversidadeFederaldePernambuco,
Recife-PE50670-901,Brazil; rbbernardino@gmail.com(R.B.);dmj.ufpe@gmail.com(D.J.);
* Correspondence:mmar@ufpe.br;Tel.:+55-81-2126-7129
Received: 31October2017;Accepted: 16 January2018;Published: 23 January2018
Abstract: Monochromatic documents claim for much less computer bandwidth for network
transmissionandstorage space than their coloror evengrayscale equivalent.Thebinarizationof
historicaldocuments is farmorecomplexthanrecentonesaspaperaging,color, texture, translucidity,
stains, back-to-front interference, kind and color of ink used in handwriting, printing process,
digitalization process, etc. are some of the factors that affect binarization. This article presents
anewbinarizationalgorithmforhistoricaldocuments. Thenewglobalfilterproposed isperformed
in four steps: filtering the imageusingabilateralfilter, splitting image into theRGBcomponents,
decision-making for each RGB channel based on an adaptive binarizationmethod inspired by
Otsu’smethodwith a choice of the threshold level, and classificationof the binarized images to
decidewhichof theRGBcomponentsbestpreservedthedocument information in the foreground.
Thequantitativeandqualitativeassessmentmadewith23binarizationalgorithms in threesetsof
“realworld”documentsshowedverygoodresults.
Keywords:documents;binarization;back-to-front interference;bleeding
1. Introduction
Document image binarization plays an important role in the document image analysis,
compression, transcription,andrecognitionpipeline [1].Binarydocumentsclaimfor far lessstorage
space and computer bandwidth for network transmission than color or grayscale documents.
Historical documents drastically increase the degree of difficulty for binarization algorithms.
Physicalnoises [2] suchasstainsandpaperagingaffect theperformanceofbinarizationalgorithms.
Besides that, historical documentswere often typed, printedorwrittenonboth sides of sheets of
paper and the opacity of the paper is often such as to allow the back printing orwriting to be
visualizedonthefrontside. Thiskindof“noise”,first calledback-to-front interference [3],was later
known as bleeding or show-through [4]. Figure 1 presents three examples of documents with
such a noise extracted from the three different datasets used in this paper in the assessment of
theproposedalgorithm. If thedocument is exhibitedeither in true-colororgray-scale, thehuman
brain is able tofilter out that sort of noise keeping its readability. The strengthof the interference
presentvarieswith theopacityof thepaper, itspermeability, thekindanddegreeoffluidityof the
inkused, its storage, age, etc. Thus, the difficulty for obtaining a goodbinarizationperformance
J. Imaging 2018,4, 27 17 www.mdpi.com/journal/jimaging
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik