Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 17 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 17 - in Document Image Processing

Bild der Seite - 17 -

Bild der Seite - 17 - in Document Image Processing

Text der Seite - 17 -

Journal of Imaging Article ANewBinarizationAlgorithmfor HistoricalDocuments MarcosAlmeida1,*,RafaelDueireLins2,3,RodrigoBernardino4,DarlissonJesus4 andBrunoLima1 1 DepartamentodeEletrônicaeSistemas,CentrodeTecnologia,UniversidadeFederaldePernambuco, Recife-PE50670-901,Brazil;brunocesar182@hotmail.com 2 CentrodeInformática,UniversidadeFederaldePernambuco,Recife-PE50740-560,Brazil; rdl.ufpe@gmail.com 3 DepartamentodeEstatísticae Informática,UniversidadeFederalRuraldePernambuco, Recife-PE52171-900,Brazil 4 ProgramadePós-GraduaçãoemEngenhariaElétrica,UniversidadeFederaldePernambuco, Recife-PE50670-901,Brazil; rbbernardino@gmail.com(R.B.);dmj.ufpe@gmail.com(D.J.); * Correspondence:mmar@ufpe.br;Tel.:+55-81-2126-7129 Received: 31October2017;Accepted: 16 January2018;Published: 23 January2018 Abstract: Monochromatic documents claim for much less computer bandwidth for network transmissionandstorage space than their coloror evengrayscale equivalent.Thebinarizationof historicaldocuments is farmorecomplexthanrecentonesaspaperaging,color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwriting, printing process, digitalization process, etc. are some of the factors that affect binarization. This article presents anewbinarizationalgorithmforhistoricaldocuments. Thenewglobalfilterproposed isperformed in four steps: filtering the imageusingabilateralfilter, splitting image into theRGBcomponents, decision-making for each RGB channel based on an adaptive binarizationmethod inspired by Otsu’smethodwith a choice of the threshold level, and classificationof the binarized images to decidewhichof theRGBcomponentsbestpreservedthedocument information in the foreground. Thequantitativeandqualitativeassessmentmadewith23binarizationalgorithms in threesetsof “realworld”documentsshowedverygoodresults. Keywords:documents;binarization;back-to-front interference;bleeding 1. Introduction Document image binarization plays an important role in the document image analysis, compression, transcription,andrecognitionpipeline [1].Binarydocumentsclaimfor far lessstorage space and computer bandwidth for network transmission than color or grayscale documents. Historical documents drastically increase the degree of difficulty for binarization algorithms. Physicalnoises [2] suchasstainsandpaperagingaffect theperformanceofbinarizationalgorithms. Besides that, historical documentswere often typed, printedorwrittenonboth sides of sheets of paper and the opacity of the paper is often such as to allow the back printing orwriting to be visualizedonthefrontside. Thiskindof“noise”,first calledback-to-front interference [3],was later known as bleeding or show-through [4]. Figure 1 presents three examples of documents with such a noise extracted from the three different datasets used in this paper in the assessment of theproposedalgorithm. If thedocument is exhibitedeither in true-colororgray-scale, thehuman brain is able tofilter out that sort of noise keeping its readability. The strengthof the interference presentvarieswith theopacityof thepaper, itspermeability, thekindanddegreeoffluidityof the inkused, its storage, age, etc. Thus, the difficulty for obtaining a goodbinarizationperformance J. Imaging 2018,4, 27 17 www.mdpi.com/journal/jimaging
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing