Seite - 123 - in Document Image Processing

Bild der Seite - 123 -

Text der Seite - 123 -

J. Imaging 2018,4, 43 Figure28.Errorrate forSundanesewordrecognitionandtransliterationtest set. 6.ConclusionsandFutureWork A comprehensive experimental test of the principal tasks in a DIA system, starting with binarization, text linesegmentation,andisolatedcharacter/glyphrecognition,andcontinuingonto wordrecognitionand transliteration foranewcollectionofpalm leafmanuscripts fromSoutheast Asia, is presented. The results fromall experimentsprovide the latest ﬁndings andaquantitative benchmarkofpalmleafmanuscriptsanalysis forresearchers in theDIAcommunity. Binarizingthe palmleafmanuscript imagesseemsverychallenging. Still,withmanybrokenandunrecognizable characters/glyphsandnoisesdetected in the images, binarizationshouldbe reconsidered theﬁrst step in theDIAprocess forpalm leafmanuscripts. On theotherhand, although there are already training-basedDIAmethodsthatdonotrequirethisbinarizationprocess, theyusuallyrequireadequate trainingdata. Theproblemof inadequate trainingdataalso inﬂuencesglyphrecognitionandword transliteration. Theunbalancednumberof imagesamples foreachcharacter classmeans theCNN methodsdidnotperformoptimally inglyph recognition. Thedifferences in the recognition rates of theCNNmethodsarenot toosigniﬁcantwith thehandcrafted featurecombinations. For future work,more syntheticdata training forpalm leafmanuscript images shouldbegenerated inorder to support the trainingprocess. Especially for theword transliteration task,more synthetic data trainingwithamore frequentword shouldbegenerated inorder to improve the trainingprocess. Many examples of glyph-to-syllable association shouldbe synthetically generated to transliterate syllabic scripts fromSoutheastAsia. The special characteristics andchallengesposedby thepalm leafmanuscript collectionswill require a thorough adaptation of theDIA system. Some speciﬁc adjustmentsneedtobeapplied to theDIAmethods forother typesofdocuments. Theadaptationofa DIAforpalmleafmanuscripts isnotuniqueandisnotuniversal forall typesofproblemfromdifferent collections.However,amongtheDIAsystem’snon-uniquesolutions,onespeciﬁcsolutioncanstill bedesignedtodeliver themostoptimalDIAsystemperformancewhilestill taking intoaccount the conditionsof thatcollection. Acknowledgments:Theauthorswould like to thankMuseumGedongKertya,MuseumBali,UndangAhmad Darsa, thephilologists fromSundaneseCentreStudiesofUniversitasPadjadjaran, theSitusKabuyutanCiburuy Garut,all families inBali, Indonesia, theEFEOteam, theBuddhist Institute,andtheNationalLibraryinCambodia for providing uswith samples of palm leafmanuscripts. We also thank the students from theDepartment of Informatics Education and theDepartment of BalineseLiterature,University of PendidikanGanesha, the InstituteofTechnologyofCambodia,andtheNational InstituteofPost,TelecommunicationandICTforhelping uswith the ground truthingprocess for this researchproject. Thiswork is supported by theDIKTI BPPLN IndonesianScholarshipProgram, theSTICAsiaProgramimplementedbytheFrenchMinistryofForeignAffairs andInternationalDevelopment (MAEDI), andARES-CCD(programAI2014-2019)under the fundingofBelgian universitycooperation,andDRPMIUniversitasPadjadjaran,DIKTI InternationalCollaborationandPublication grant2017. Author Contributions: The Balinese dataset was prepared byMadeWinduAntara Kesiman. The Khmer datasetwaspreparedbyDonaValyandSopheaChhun.TheSundanesedatasetwaspreparedbyErickPaulus, MiraSuryani, andSetiawanHadi. Jean-ChristopheBurie,MichelVerleysen,andJean-MarcOgiercontributedto designingagroundtruthvalidationprotocol.MadeWinduAntaraKesimanandDonaValyconceived,designed, 123

zurück zum Buch Document Image Processing"

Document Image Processing

Titel: Document Image Processing
Autoren: Ergina Kavallieratou; Laurence Likforman-Sulem
Herausgeber: MDPI
Ort: Basel
Datum: 2018
Sprache: deutsch
Lizenz: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Abmessungen: 17.0 x 24.4 cm
Seiten: 216
Schlagwörter: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie: Informatik