Seite - 105 - in Document Image Processing
Bild der Seite - 105 -
Text der Seite - 105 -
J. Imaging 2018,4, 43
ca, ra, etc.),punctuation,diacritics (suchaspanghulu,pangwisad,paneuleung,panyuku, etc.), andmany
special compoundcharacters.
Figure3.Sundanesepalmleafmanuscript.
2.4. ChallengesofDocument ImageAnalysis forPalmLeafManuscripts
There are twomain technical challenges to assessingpalm leafmanuscripts in aDIAsystem.
Thefirst challenge is thephysical conditionof thepalmleafmanuscript,whichwill strongly influence
thequalityof thedocument imagescaptured. For the imagecapturingprocess forDIAresearch,data
inapaperdocumentareusuallycapturedbyopticalscanning,butwhenthedocument isonadifferent
mediumsuchasmicrofilm,palmleaves,or fabric,photographicmethodsareoftenusedtocapture the
images [13].Nowadays,dueto thespecificcharacteristicsof thephysical supportof themanuscripts,
thedevelopmentofDIAmethods forpalmleafmanuscripts inorder toextract relevant information is
consideredanewresearchprobleminhandwrittendocumentanalysis.Ancientpalmleafmanuscripts
containartifactsduetoaging, foxing,yellowing, strain, local shadingeffects, lowintensityvariations
orpoorcontrast, randomnoises,discoloredparts, fading,andother typesofdegradation.
The second challenge is the complexity of the script. The SoutheastAsianmanuscriptswith
different scripts and languages provide real challenges for document analysismethods, not only
becauseof thedifferent formsof characters in the script, butalsobecause thewritingstyleof each
script(e.g.,howtojoinorseparateacharacter inatext line)differs. It rangeswidelyfromabinarization
process [23–25], text linesegmentation[26,27],andcharacterandtext recognitiontasks [25,28,29], to
thewordspottingmethods [30].
InthedomainofDIA,handwrittencharacterandtextrecognitionhasbeenthesubjectof intensive
researchduringthelast threedecades. Somemethodshavealreadyreachedasatisfactoryperformance,
especiallyforLatin,Chinese,andJapanesescripts.However, thedevelopmentofhandwrittencharacter
andtext recognitionmethodsforothervariousAsianscriptspresentsmanyissues. In theOCRtask
anddevelopmentforpalmleafmanuscripts fromSoutheastAsia, severaldeformations inthecharacter
shapesarevisibledue to themergesandfracturesof theuseofnonstandardfonts. Thesimilarities
of distinct character shapes, overlaps, and interconnection of the neighboring characters further
complicate theOCRsystem [31]. Oneof themainproblems facedwhendealingwith segmented
handwritten character recognition is the ambiguity and illegibility of the characters [32]. These
characteristicsprovide suitable conditions to test andevaluate the robustnessof featureextraction
methods thatwereproposedforcharacter recognition.
3.DocumentImageAnalysisTasksandInvestigatedMethods
Heritagedocumentpreservation isnot justaboutconvertingphysicaldocuments intodocument
images. Withmanyphysical documents beingdigitized and stored in largedocumentdatabases,
and then sent and received via digitalmachines, the interest and demand grew to requiremore
functionalities thansimplyviewingandprint the images [33]. Further treatment is requiredbefore
thecollectionofdocument imagescanbeexploredmoreextensively. Forexample, amore specific
researchfieldneeded tobedeveloped toaddmachinecapabilities for extracting information from
these images, reading text onadocumentpage, finding sentences, and locatingparagraphs, lines,
words,andsymbolsonadiagram[33].
In thiswork, themethods foreachDIAtaskwere investigatedforpalmleafmanuscripts. The
binarization task is evaluatedusing the latestmethods frombinarization competitions. The seam
105
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik