Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 105 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 105 - in Document Image Processing

Bild der Seite - 105 -

Bild der Seite - 105 - in Document Image Processing

Text der Seite - 105 -

J. Imaging 2018,4, 43 ca, ra, etc.),punctuation,diacritics (suchaspanghulu,pangwisad,paneuleung,panyuku, etc.), andmany special compoundcharacters. Figure3.Sundanesepalmleafmanuscript. 2.4. ChallengesofDocument ImageAnalysis forPalmLeafManuscripts There are twomain technical challenges to assessingpalm leafmanuscripts in aDIAsystem. Thefirst challenge is thephysical conditionof thepalmleafmanuscript,whichwill strongly influence thequalityof thedocument imagescaptured. For the imagecapturingprocess forDIAresearch,data inapaperdocumentareusuallycapturedbyopticalscanning,butwhenthedocument isonadifferent mediumsuchasmicrofilm,palmleaves,or fabric,photographicmethodsareoftenusedtocapture the images [13].Nowadays,dueto thespecificcharacteristicsof thephysical supportof themanuscripts, thedevelopmentofDIAmethods forpalmleafmanuscripts inorder toextract relevant information is consideredanewresearchprobleminhandwrittendocumentanalysis.Ancientpalmleafmanuscripts containartifactsduetoaging, foxing,yellowing, strain, local shadingeffects, lowintensityvariations orpoorcontrast, randomnoises,discoloredparts, fading,andother typesofdegradation. The second challenge is the complexity of the script. The SoutheastAsianmanuscriptswith different scripts and languages provide real challenges for document analysismethods, not only becauseof thedifferent formsof characters in the script, butalsobecause thewritingstyleof each script(e.g.,howtojoinorseparateacharacter inatext line)differs. It rangeswidelyfromabinarization process [23–25], text linesegmentation[26,27],andcharacterandtext recognitiontasks [25,28,29], to thewordspottingmethods [30]. InthedomainofDIA,handwrittencharacterandtextrecognitionhasbeenthesubjectof intensive researchduringthelast threedecades. Somemethodshavealreadyreachedasatisfactoryperformance, especiallyforLatin,Chinese,andJapanesescripts.However, thedevelopmentofhandwrittencharacter andtext recognitionmethodsforothervariousAsianscriptspresentsmanyissues. In theOCRtask anddevelopmentforpalmleafmanuscripts fromSoutheastAsia, severaldeformations inthecharacter shapesarevisibledue to themergesandfracturesof theuseofnonstandardfonts. Thesimilarities of distinct character shapes, overlaps, and interconnection of the neighboring characters further complicate theOCRsystem [31]. Oneof themainproblems facedwhendealingwith segmented handwritten character recognition is the ambiguity and illegibility of the characters [32]. These characteristicsprovide suitable conditions to test andevaluate the robustnessof featureextraction methods thatwereproposedforcharacter recognition. 3.DocumentImageAnalysisTasksandInvestigatedMethods Heritagedocumentpreservation isnot justaboutconvertingphysicaldocuments intodocument images. Withmanyphysical documents beingdigitized and stored in largedocumentdatabases, and then sent and received via digitalmachines, the interest and demand grew to requiremore functionalities thansimplyviewingandprint the images [33]. Further treatment is requiredbefore thecollectionofdocument imagescanbeexploredmoreextensively. Forexample, amore specific researchfieldneeded tobedeveloped toaddmachinecapabilities for extracting information from these images, reading text onadocumentpage, finding sentences, and locatingparagraphs, lines, words,andsymbolsonadiagram[33]. In thiswork, themethods foreachDIAtaskwere investigatedforpalmleafmanuscripts. The binarization task is evaluatedusing the latestmethods frombinarization competitions. The seam 105
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing