Seite - 123 - in Document Image Processing
Bild der Seite - 123 -
Text der Seite - 123 -
J. Imaging 2018,4, 43
Figure28.Errorrate forSundanesewordrecognitionandtransliterationtest set.
6.ConclusionsandFutureWork
A comprehensive experimental test of the principal tasks in a DIA system, starting with
binarization, text linesegmentation,andisolatedcharacter/glyphrecognition,andcontinuingonto
wordrecognitionand transliteration foranewcollectionofpalm leafmanuscripts fromSoutheast
Asia, is presented. The results fromall experimentsprovide the latest findings andaquantitative
benchmarkofpalmleafmanuscriptsanalysis forresearchers in theDIAcommunity. Binarizingthe
palmleafmanuscript imagesseemsverychallenging. Still,withmanybrokenandunrecognizable
characters/glyphsandnoisesdetected in the images, binarizationshouldbe reconsidered thefirst
step in theDIAprocess forpalm leafmanuscripts. On theotherhand, although there are already
training-basedDIAmethodsthatdonotrequirethisbinarizationprocess, theyusuallyrequireadequate
trainingdata. Theproblemof inadequate trainingdataalso influencesglyphrecognitionandword
transliteration. Theunbalancednumberof imagesamples foreachcharacter classmeans theCNN
methodsdidnotperformoptimally inglyph recognition. Thedifferences in the recognition rates
of theCNNmethodsarenot toosignificantwith thehandcrafted featurecombinations. For future
work,more syntheticdata training forpalm leafmanuscript images shouldbegenerated inorder
to support the trainingprocess. Especially for theword transliteration task,more synthetic data
trainingwithamore frequentword shouldbegenerated inorder to improve the trainingprocess.
Many examples of glyph-to-syllable association shouldbe synthetically generated to transliterate
syllabic scripts fromSoutheastAsia. The special characteristics andchallengesposedby thepalm
leafmanuscript collectionswill require a thorough adaptation of theDIA system. Some specific
adjustmentsneedtobeapplied to theDIAmethods forother typesofdocuments. Theadaptationofa
DIAforpalmleafmanuscripts isnotuniqueandisnotuniversal forall typesofproblemfromdifferent
collections.However,amongtheDIAsystem’snon-uniquesolutions,onespecificsolutioncanstill
bedesignedtodeliver themostoptimalDIAsystemperformancewhilestill taking intoaccount the
conditionsof thatcollection.
Acknowledgments:Theauthorswould like to thankMuseumGedongKertya,MuseumBali,UndangAhmad
Darsa, thephilologists fromSundaneseCentreStudiesofUniversitasPadjadjaran, theSitusKabuyutanCiburuy
Garut,all families inBali, Indonesia, theEFEOteam, theBuddhist Institute,andtheNationalLibraryinCambodia
for providing uswith samples of palm leafmanuscripts. We also thank the students from theDepartment
of Informatics Education and theDepartment of BalineseLiterature,University of PendidikanGanesha, the
InstituteofTechnologyofCambodia,andtheNational InstituteofPost,TelecommunicationandICTforhelping
uswith the ground truthingprocess for this researchproject. Thiswork is supported by theDIKTI BPPLN
IndonesianScholarshipProgram, theSTICAsiaProgramimplementedbytheFrenchMinistryofForeignAffairs
andInternationalDevelopment (MAEDI), andARES-CCD(programAI2014-2019)under the fundingofBelgian
universitycooperation,andDRPMIUniversitasPadjadjaran,DIKTI InternationalCollaborationandPublication
grant2017.
Author Contributions: The Balinese dataset was prepared byMadeWinduAntara Kesiman. The Khmer
datasetwaspreparedbyDonaValyandSopheaChhun.TheSundanesedatasetwaspreparedbyErickPaulus,
MiraSuryani, andSetiawanHadi. Jean-ChristopheBurie,MichelVerleysen,andJean-MarcOgiercontributedto
designingagroundtruthvalidationprotocol.MadeWinduAntaraKesimanandDonaValyconceived,designed,
123
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik