Seite - 45 - in Document Image Processing

Bild der Seite - 45 -

Text der Seite - 45 -

Journal of Imaging Article Text/Non-TextSeparationfromHandwritten DocumentImagesUsingLBPBasedFeatures: AnEmpiricalStudy SouravGhosh1,*,DibyadwatiLahiri 1,*,ShowmikBhowmik1,ErginaKavallieratou2 andRamSarkar1 1 DepartmentofComputerScienceandEngineering, JadavpurUniversity,Kolkata,WestBengal700032, India; showmik.cse@gmail.com(S.B.); raamsarkar@gmail.com(R.S.) 2 Departmentof InformationandCommunicationSystemsEngineering,UniversityofAegean, Lesbos81100,Greece;kavallieratou@aegean.gr * Correspondence: souravghosh2197@gmail.com(S.G.);dibyadwati.lahiri@gmail.com(D.L.) Received: 15December2017;Accepted: 6April2018;Published: 12April2018 Abstract: Isolatingnon-text components fromthe text componentspresent inhandwrittendocument images isan importantbut lessexploredresearcharea.Addressing this issue, in thispaper,wehave presented an empirical study on the applicability of various Local Binary Pattern (LBP) based texture features for this problem. This paper also proposes aminormodiﬁcation in one of the variants of the LBP operator to achieve better performance in the text/non-text classiﬁcation problem. The feature descriptors are then evaluated on a database, made up of images from 104 handwritten laboratory copies and class notes of various engineering and science branches, usingﬁvewell-knownclassiﬁers. Classiﬁcationresults reﬂect theeffectivenessofLBP-basedfeature descriptors in text/non-text separation. Keywords: text/non-text separation; localbinarypattern;handwrittendocument;document image processing; texture-basedfeatures 1. Introduction Documents, in themodern day, are required to be stored in digitized form to increase their longevity,portabilityandsecurity. Inorder toachievethispurpose, thedevelopmentofacomplete Document ImageProcessingSystem(DIPS)hasbecomeanutmostneed.Alongwith theothersteps, anyDIPS needs to identify the texts present in a document image separately from the non-text components like tables, diagrams, graphic designs before processing the text through anOptical CharacterRecognition(OCR)engine[1–3]. Thereasonfor this isveryobvious:OCRenginesdonot processnon-textcomponents. Researchers, todate,havereportedmanysolutions to thisproblemfor printeddocuments [4–6].However, thesameisnot true forregularhandwrittendocuments;a rather limited amount ofwork is available in this area, to thebest of our knowledge, amongwhich two signiﬁcantonesare [7,8]. Indocument imageprocessing, researchersmostlyuseOCRtechnology in order toworkonwordand/orcharacter level toprovideaviable solution for informationcontent exploitation[9]. Ingeneral,handwrittendocumentsareunstructured i.e., inmostcases, thesedocumentsdonot followanyspeciﬁc layout,unlike theprinteddocuments. Thus, theappearanceof textandnon-text in handwrittendocuments isverychaotic. Forexample, text componentsoftenoverlapwith thenon-text components. Furthermore, thebuildingblocks (i.e., characters)of the text inhandwrittendocuments donot followthestandardshapeandsizeusually foundin itsprintedcounterpart. Oneof thekey difﬁculties in thegraphics recognitiondomain is also toworkon complex and composite symbol J. Imaging 2018,4, 57 45 www.mdpi.com/journal/jimaging

zurück zum Buch Document Image Processing"

Document Image Processing

Titel: Document Image Processing
Autoren: Ergina Kavallieratou; Laurence Likforman-Sulem
Herausgeber: MDPI
Ort: Basel
Datum: 2018
Sprache: deutsch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Abmessungen: 17.0 x 24.4 cm
Seiten: 216
Schlagwörter: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie: Informatik