Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 60 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 60 - in Document Image Processing

Bild der Seite - 60 -

Bild der Seite - 60 - in Document Image Processing

Text der Seite - 60 -

Journal of Imaging Article AHolisticTechniqueforanArabicOCRSystem FarhanM.A.Nashwan1,MohsenA.A.Rashwan1,HassaninM.Al-Barhamtoshy2, SherifM.Abdou3,*andAbdullahM.Moussa1 1 DepartmentofElectronicsandElectricalCommunications,CairoUniversity,Giza12613,Egypt; far_nash@hotmail.com(F.M.A.N.);mrashwan@rdi-eg.com(M.A.A.R.); a.m.moussa@ieee.org (A.M.M.) 2 FacultyofComputingandInformationTechnology,KingAbdulazizUniversity, Jeddah21589,SaudiArabia; hassanin@kau.edu.sa 3 FacultyofComputers&Information,CairoUniversity,Giza12613,Egypt * Correspondence: s.abdou@fci-cu.edu.eg;Tel.: +20-10-2661-4479 Received: 30October2017;Accepted: 22December2017;Published: 27December2017 Abstract:Analyticalbasedapproaches inOpticalCharacterRecognition (OCR)systemscanendurea significantamountofsegmentationerrors, especiallywhendealingwithcursive languagessuchas theArabic languagewith frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especiallywhendealingwith largevocabularyapplications. In thispaper,we introduce a computationally efficient, holisticArabicOCRsystem. A lexicon reductionapproachbasedon clusteringsimilar shapedwords isusedtoreducerecognition time.Usingglobalword levelDiscrete CosineTransform(DCT)basedfeatures incombinationwith localblockbasedfeatures,ourproposed approachmanaged to generalize for new font sizes thatwere not included in the training data. Evaluationresults for theapproachusingdifferent test sets frommodernandhistoricalArabicbooks arepromisingcomparedwithstateofartArabicOCRsystems. Keywords:ArabicOCRsystems;holisticOCRapproach;holisticOCRfeatures; lexiconreduction 1. Introduction Cursivescriptsrecognitionhastraditionallybeenhandledbytwomajorparadigms: asegmentation- basedanalytical approachandaword-basedholistic approach. In theanalytical approach, the input wordis treatedasasequenceofunits (usuallycharacters). Eachunit is thenindividuallyrecognized[1–4]. Thisapproachhasseveraldisadvantages.Thesegmentationofcursivewordsisachallengingtaskand anyerrors in thatprocesswill increase theerrors in the followingrecognitionstep. Also,manyof the usedfonts forcursivescriptsextensivelyuse ligatureswheretwoormorelettersare joinedasasingle glyph,whichcomplicates thecharacter levelsegmentation. Figure1showssomechallengingsamplesof Arabicwords. Figure1.SomeexamplesofArabicwords thatcontain ligatureswithmanuallysegmentedcharacters. Cursivelywrittenwordcannotberecognizedwithoutbeingsegmentedandcannotbesegmented withoutbeingrecognized[5]. Thisphenomenon,knownasSayre’sparadox,pushes thecommunity to J. Imaging 2018,4, 6 60 www.mdpi.com/journal/jimaging
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing