Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 68 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 68 - in Document Image Processing

Bild der Seite - 68 -

Bild der Seite - 68 - in Document Image Processing

Text der Seite - 68 -

J. Imaging 2018,4, 6 Fromtheresults inTable4wecansee thatourArabicholisticOCRsystemachieved77.3%WRR forrecentbooksand47.8%WRRforoldbooks.Consideringthetop-10hypotheses, theWRRforrecent books increasedto87.7%andforoldbooks increasedto65.7%.Whenconsideringtop-20hypotheses, theWRRincreased to89%and69%for recent andoldbooks, respectively. Adataanalysis for the recognitionerrorsof thebooksdata sets revealedseveral reasons that contributed to the reduction of theWRR.Wefoundthat thisdatasets includedhighOutOfVocabulary (OOV)rateofaround6% forrecentbooksand7%foroldbooks. It isknownthat theeffectof theOOVisaccumulativewhich meansasingleOOVwordcanresult inrecognitionerrors formore thanoneof itsneighboringwords. Anotherphenomenonthatwenoticed in thesedataset is thehighrateofusingtheKashidacharacter, whichwas4%forrecentbooksand6%foroldbooks. TheKashidacharacter resulted inalteringthe shapesofsomecharacterswhichcausedsomewordrecognitionerrors.Also,wenoticedthatsome fontsof theoldbookshad largedifferences fromthe fontsused in training the systemsuchas the Anglo-fontwhichresulted invery lowWRRforsomepages. Whenweapplieda4-gramlanguagemodel rescoring for thebooksdatasetsusing the top-10 hypothesis,weachieved83%WRRfor therecentbookssetand53%WRRfor theoldbooksset.Wegot anabsolutegainof6%inWRRforbothof therecentandoldbooksdatasets. This result showthata highpercentageof thesystemrecognitionerrorscanbecorrectedusingthe top-nhypothesesanda languagemodel. In the fourth evaluation, we compared the performance of the proposed systemwith three commercial Arabic OCR systems, Sakhr, ABBYY and NovoDynamics, which represent the best performingArabicOCRpackagescurrentlyavailable. Table5showsthesecomparativeresults. Table5.Recognitionrate (percent)of recentcomputerizedanduncomputerizedbooks. EDstands for Euclideandistance. BooksType NovoDynamics Sakhr ABBYY Holistic (UsingTop15withLM)SquaredED/AbsoluteED Computerized 88.45 82.17 54.33 82.97/84.76 Uncomputerized 78.15 54.94 29.22 53.21/58.04 Theresults inTable5showthat,whileusingsquaredEuclideandistanceas thedistancemeasure, our systemmanaged to achieve better performance than two systems,ABBYYandSakhr, for the computerized books data set and achieved better performance than the ABBYY system for the uncomputerized books data set. Whenweused the absolute Euclideandistance, the recognition rate increasedfrom82.97%to84.76%forthecomputerizedbookssetandfrom53.21%to58.04%forthe uncomputerizedbooksset,andtheproposedsystemoutperformedSakhrandABBYYsystemsforboth of the twodatasets, althoughtheNovoDynamicssystemoutperfomstheproposedone.Oursystemis stillmuchfaster, aswewill see in thenextsection. Asheavycomputation isoneof themaindrawbacks for theholisticapproach,weevaluatedthe runtimespeedof thepresentedsystem.Table6showstheprocessingtimesof theproposedsystem beforeandafter lexical reductionversus thenumberof selectedwordclusters. Theseexperiments wererunonCore i72.8GHzmachinewithsingle threadexecution. Table6.ProcessingtimeofwordsearchandLMvs.wordscandidates. SelectedWords ProcessingTime(s/word) NoReduction 0.545 LexiconReduction(1cluster) 0.0005 LexiconReduction(5clusters) 0.0026 LexiconReduction(10clusters) 0.0051 68
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing