Seite - 45 - in Document Image Processing
Bild der Seite - 45 -
Text der Seite - 45 -
Journal of
Imaging
Article
Text/Non-TextSeparationfromHandwritten
DocumentImagesUsingLBPBasedFeatures:
AnEmpiricalStudy
SouravGhosh1,*,DibyadwatiLahiri 1,*,ShowmikBhowmik1,ErginaKavallieratou2
andRamSarkar1
1 DepartmentofComputerScienceandEngineering, JadavpurUniversity,Kolkata,WestBengal700032, India;
showmik.cse@gmail.com(S.B.); raamsarkar@gmail.com(R.S.)
2 Departmentof InformationandCommunicationSystemsEngineering,UniversityofAegean,
Lesbos81100,Greece;kavallieratou@aegean.gr
* Correspondence: souravghosh2197@gmail.com(S.G.);dibyadwati.lahiri@gmail.com(D.L.)
Received: 15December2017;Accepted: 6April2018;Published: 12April2018
Abstract: Isolatingnon-text components fromthe text componentspresent inhandwrittendocument
images isan importantbut lessexploredresearcharea.Addressing this issue, in thispaper,wehave
presented an empirical study on the applicability of various Local Binary Pattern (LBP) based
texture features for this problem. This paper also proposes aminormodification in one of the
variants of the LBP operator to achieve better performance in the text/non-text classification
problem. The feature descriptors are then evaluated on a database, made up of images from
104 handwritten laboratory copies and class notes of various engineering and science branches,
usingfivewell-knownclassifiers. Classificationresults reflect theeffectivenessofLBP-basedfeature
descriptors in text/non-text separation.
Keywords: text/non-text separation; localbinarypattern;handwrittendocument;document image
processing; texture-basedfeatures
1. Introduction
Documents, in themodern day, are required to be stored in digitized form to increase their
longevity,portabilityandsecurity. Inorder toachievethispurpose, thedevelopmentofacomplete
Document ImageProcessingSystem(DIPS)hasbecomeanutmostneed.Alongwith theothersteps,
anyDIPS needs to identify the texts present in a document image separately from the non-text
components like tables, diagrams, graphic designs before processing the text through anOptical
CharacterRecognition(OCR)engine[1–3]. Thereasonfor this isveryobvious:OCRenginesdonot
processnon-textcomponents. Researchers, todate,havereportedmanysolutions to thisproblemfor
printeddocuments [4–6].However, thesameisnot true forregularhandwrittendocuments;a rather
limited amount ofwork is available in this area, to thebest of our knowledge, amongwhich two
significantonesare [7,8]. Indocument imageprocessing, researchersmostlyuseOCRtechnology in
order toworkonwordand/orcharacter level toprovideaviable solution for informationcontent
exploitation[9].
Ingeneral,handwrittendocumentsareunstructured i.e., inmostcases, thesedocumentsdonot
followanyspecific layout,unlike theprinteddocuments. Thus, theappearanceof textandnon-text in
handwrittendocuments isverychaotic. Forexample, text componentsoftenoverlapwith thenon-text
components. Furthermore, thebuildingblocks (i.e., characters)of the text inhandwrittendocuments
donot followthestandardshapeandsizeusually foundin itsprintedcounterpart. Oneof thekey
difficulties in thegraphics recognitiondomain is also toworkon complex and composite symbol
J. Imaging 2018,4, 57 45 www.mdpi.com/journal/jimaging
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik