Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 102 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 102 - in Document Image Processing

Bild der Seite - 102 -

Bild der Seite - 102 - in Document Image Processing

Text der Seite - 102 -

J. Imaging 2018,4, 43 IAMHistorical DocumentDatabase (IAM-HistDB) (http://www.fki.inf.unibe.ch/databases/iam- historical-document-database) [3],which includeshandwrittenhistoricalmanuscript images from theSaintGallDatabase fromthe9thcentury inLatin; theParzivalDatabase fromthe13thcentury inGerman; theWashingtonDatabase from the 18th century inEnglish; theAncient Lives Project (https://www.ancientlives.org/) [4],whichasksvolunteers to transcribeAncientGreektext fragments fromtheOxyrhynchusPapyri collection;andmanyotherprojects. Toaccelerate theprocessofaccessing,preserving,anddisseminating thecontentsof theheritage documents, a DIA system is needed. Besides aiming to preserve the existence of such ancient documents physically, the DIA system is expected to enable open access to the contents of the documentsandprovideopportunities forawideraudience toaccessall the important information stored in the document. DIA is the process of using various technologies to extract text, printed orhandwritten,andgraphics fromdigitizeddocumentfiles (http://www.cvisiontech.com/library/ pdf/pdf-document/document-image-analysis.html) [5]. DIAsystemsgenerallyhaveamajor role in identifying,analyzing,extracting, structuring,andtransferringdocumentcontentsmorequickly, effectively,andefficiently. Thissystemisable toworksemi-automaticallyorevenfullyautomatically withouthumanintervention. TheDIAsystemisexpectedtosave time, cost, andeffortatmanypoints in theheritagedocumentpreservationprocess. However,althoughtheDIAresearchdevelopsrapidly, it isundeniable thatmostof thedocument collectionsusedinthe initial steparefromdevelopedregionssuchasAmericaandEuropeancountries. The document samples from these countries are mostly written in English or old English with Latin/Romanscript. Severalimportantdocumentcollectionswerefinallyusedasstandardbenchmarks fortheevaluationofthelatestDIAresearchresults. ThenextwaveofDIAresearchfinallybegantodeal withdocuments fromnon-English-speakingareaswithnon-Latinscripts, suchasArabic,Chinese,and Japanesedocuments.DuringtheevolutionofDIAresearch in the last twodecades,DIAresearchers haveproposedandachievedsatisfactorysolutions formanycomplexproblemsofdocumentanalysis for these typesofdocuments.However, theDIAresearchchallenge isongoing. The latest challenge is documents fromAsia,withnewlanguagesandmorecomplexscripts toexplore, suchasDevanagari script [6], Gurmukhi script [7–10], Bangla script [11], andMalayalam script [12], and the case of multiple languages and scripts indocuments fromIndia. Optical character recognition (OCR) for Indian languages isconsideredmoredifficult ingeneral thanforEuropeanlanguagesbecauseof the largenumberofvowels, consonants,andconjuncts (combinationsofvowelsandconsonants) [13]. This workwas part of exploringDIA research for a palm leafmanuscripts collection from SoutheastAsia. This collectionoffersanewchallenge forDIAresearchersbecausepalmleavesare usedas thewritingmediumandthe languageandscripthaveneverbeenanalyzedbefore. In this paper, wedid a comprehensive benchmark experimental test of someprincipal tasks in theDIA system,startingwithbinarization, text linesegmentation, isolatedcharacter/glyphrecognition,word recognition,andtransliteration. To thebestofourknowledge, thiswork is thefirst comprehensive studyof theDIAresearchers’ communityandthefirst toperformacompleteseriesofexperimental benchmarking analyses of palm leafmanuscripts. The results of this researchwill be veryuseful inaccelerating,evaluating,andimprovingtheperformanceofexistingDIAsystemsforanewtype ofdocument. Thispaper isorganizedas follow. Section2givesabriefdescriptionof thepalmleafmanuscripts collection fromSoutheastAsia, especially theKhmerpalmleafmanuscript corpus fromCambodia and twopalm leafmanuscript corpuses, theBalineseandSundanesemanuscripts fromIndonesia. ThechallengesofDIAfor thismanuscript corpusarealsopresented in this section. Section3describes theDIA tasks that need to be developed for the palm leafmanuscript collections, followed by a descriptionof themethods investigatedfor those tasks. Thedatasetsandevaluationmethods foreach DIAtaskusedintheexperimental studies for thisworkarepresented inSection4. Section5reports andanalyzes thedetailedresultsof theexperiments. Finally, conclusionsaregiven inSection6. 102
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing