Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 17 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 17 - in Document Image Processing

Image of the Page - 17 -

Image of the Page - 17 - in Document Image Processing

Text of the Page - 17 -

Journal of Imaging Article ANewBinarizationAlgorithmfor HistoricalDocuments MarcosAlmeida1,*,RafaelDueireLins2,3,RodrigoBernardino4,DarlissonJesus4 andBrunoLima1 1 DepartamentodeEletrônicaeSistemas,CentrodeTecnologia,UniversidadeFederaldePernambuco, Recife-PE50670-901,Brazil;brunocesar182@hotmail.com 2 CentrodeInformática,UniversidadeFederaldePernambuco,Recife-PE50740-560,Brazil; rdl.ufpe@gmail.com 3 DepartamentodeEstatísticae Informática,UniversidadeFederalRuraldePernambuco, Recife-PE52171-900,Brazil 4 ProgramadePós-GraduaçãoemEngenhariaElétrica,UniversidadeFederaldePernambuco, Recife-PE50670-901,Brazil; rbbernardino@gmail.com(R.B.);dmj.ufpe@gmail.com(D.J.); * Correspondence:mmar@ufpe.br;Tel.:+55-81-2126-7129 Received: 31October2017;Accepted: 16 January2018;Published: 23 January2018 Abstract: Monochromatic documents claim for much less computer bandwidth for network transmissionandstorage space than their coloror evengrayscale equivalent.Thebinarizationof historicaldocuments is farmorecomplexthanrecentonesaspaperaging,color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwriting, printing process, digitalization process, etc. are some of the factors that affect binarization. This article presents anewbinarizationalgorithmforhistoricaldocuments. Thenewglobalfilterproposed isperformed in four steps: filtering the imageusingabilateralfilter, splitting image into theRGBcomponents, decision-making for each RGB channel based on an adaptive binarizationmethod inspired by Otsu’smethodwith a choice of the threshold level, and classificationof the binarized images to decidewhichof theRGBcomponentsbestpreservedthedocument information in the foreground. Thequantitativeandqualitativeassessmentmadewith23binarizationalgorithms in threesetsof “realworld”documentsshowedverygoodresults. Keywords:documents;binarization;back-to-front interference;bleeding 1. Introduction Document image binarization plays an important role in the document image analysis, compression, transcription,andrecognitionpipeline [1].Binarydocumentsclaimfor far lessstorage space and computer bandwidth for network transmission than color or grayscale documents. Historical documents drastically increase the degree of difficulty for binarization algorithms. Physicalnoises [2] suchasstainsandpaperagingaffect theperformanceofbinarizationalgorithms. Besides that, historical documentswere often typed, printedorwrittenonboth sides of sheets of paper and the opacity of the paper is often such as to allow the back printing orwriting to be visualizedonthefrontside. Thiskindof“noise”,first calledback-to-front interference [3],was later known as bleeding or show-through [4]. Figure 1 presents three examples of documents with such a noise extracted from the three different datasets used in this paper in the assessment of theproposedalgorithm. If thedocument is exhibitedeither in true-colororgray-scale, thehuman brain is able tofilter out that sort of noise keeping its readability. The strengthof the interference presentvarieswith theopacityof thepaper, itspermeability, thekindanddegreeoffluidityof the inkused, its storage, age, etc. Thus, the difficulty for obtaining a goodbinarizationperformance J. Imaging 2018,4, 27 17 www.mdpi.com/journal/jimaging
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing