Page - 101 - in Document Image Processing
Image of the Page - 101 -
Text of the Page - 101 -
Journal of
Imaging
Article
BenchmarkingofDocumentImageAnalysisTasksfor
PalmLeafManuscripts fromSoutheastAsia
MadeWinduAntaraKesiman1,2,*,DonaValy3,4, Jean-ChristopheBurie1,ErickPaulus5,
MiraSuryani 5,SetiawanHadi5,MichelVerleysen2,SopheaChhun4 andJean-MarcOgier 1
1 Laboratoire InformatiqueImageInteraction(L3i),UniversitédeLaRochelle, 17042LaRochelle,France;
jean-christophe.burie@univ-lr.fr (J.-C.B.); jean-marc.ogier@univ-lr.fr (J.-M.O.)
2 LaboratoryofCultural Informatics (LCI),UniversitasPendidikanGanesha,Singaraja,Bali81116, Indonesia;
michel.verleysen@uclouvain.be
3 Instituteof InformationandCommunicationTechnologies,Electronic, andAppliedMathematics (ICTEAM),
UniversitéCatholiquedeLouvain,1348Louvain-la-Neuve,Belgium;dona.valy@student.uclouvain.be
4 Departmentof InformationandCommunicationEngineering, InstituteofTechnologyofCambodia,
PhnomPenh,Cambodia; sophea.chhun@itc.edu.kh
5 DepartmentofComputerScience,UniversitasPadjadjaran,Bandung45363, Indonesia;
erick_paulus@yahoo.com(E.P.);mira.suryani@unpad.ac.id (M.S.); setiawanhadi@unpad.ac.id (S.H.)
* Correspondence:made_windu_antara.kesiman@univ-lr.fr
Received: 15December2017;Accepted: 18February2018;Published: 22February2018
Abstract:Thispaperpresentsacomprehensive testof theprincipal tasks indocument imageanalysis
(DIA), startingwithbinarization, text linesegmentation,andisolatedcharacter/glyphrecognition,
andcontinuingontowordrecognitionandtransliterationforanewandchallengingcollectionof
palmleafmanuscripts fromSoutheastAsia. This researchpresentsandisperformedonacomplete
datasetcollectionofSoutheastAsianpalmleafmanuscripts. It contains threedifferentscripts:Khmer
script fromCambodia,andBalinesescriptandSundanesescript fromIndonesia. Thebinarization
task is evaluatedonmanymethodsup to the latest in somebinarizationcompetitions. Theseam
carvingmethodisevaluatedfor the text linesegmentation task, comparedtoarecentlynewtext line
segmentationmethodforpalmleafmanuscripts. For the isolatedcharacter/glyphrecognitiontask,
theevaluation is reportedfromthehandcraftedfeatureextractionmethod, theneuralnetworkwith
unsupervisedlearningfeature,andtheConvolutionalNeuralNetwork(CNN)basedmethod. Finally,
theRecurrentNeuralNetwork-LongShort-TermMemory (RNN-LSTM)basedmethod isused to
analyze thewordrecognitionandtransliteration task for thepalmleafmanuscripts. Theresults from
allexperimentsprovide the latestfindingsandaquantitativebenchmarkforpalmleafmanuscripts
analysis for researchers in theDIAcommunity.
Keywords: document imageanalysis; binarization; character recognition; text line segmentation;
wordrecognition; transliteration;palmleafmanuscript;dataset;benchmark;experimental test
1. Introduction
Since theworld entered the digital age in the early 20th century, the need for a document
imageanalysis (DIA)systemis increasing. This isdueto thedramatic increase inefforts todigitize
thevarious typesofdocument collectionsavailable, especially theancientdocumentsofhistorical
relics found in various parts of theworld. Some very interesting projects on awide variety of
heritage document collections can bementioned here: for example, the tranScriptorium project
(http://transcriptorium.eu/) [1]; theREAD(RecognitionandEnrichmentofArchivalDocuments)
project (https://read.transkribus.eu/) [2], whichworks on documents from theMiddle Ages to
today,andalso focusesondifferent languages ranging fromAncientGreek tomodernEnglish; the
J. Imaging 2018,4, 43 101 www.mdpi.com/journal/jimaging
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik