Page - 111 - in Document Image Processing
Image of the Page - 111 -
Text of the Page - 111 -
J. Imaging 2018,4, 43
Thenetworkconsistsof threesetsofconvolutionandmaxpoolingpairs.All convolutional layersuse
a strideofoneandarezeropaddedso that theoutput is the samesizeas the input. Theoutputof
eachconvolutional layer isactivatedusingtheReLufunctionandfollowedbyamaxpoolingof2×2
blocks. Thenumbersof featuremaps (of size5×5)used in the threeconsecutiveconvolutional layers
are8, 16, and32, respectively. Theoutputof the last layers isflattened,andafully-connected layer
with1024neurons (alsoactivatedwithReLu) is added, followedby the last output layer (softmax
activation)consistingofNclassneurons,whereNclass is thenumberofcharacterclasses.Dropoutwith
probabilityp=0.5 isappliedbefore theoutput layer topreventoverfitting.Wetrainedthenetwork
usinganAdamoptimizerwithabatchsizeof100anda learningrateof0.0001.
Figure10.Architectureof theCNN.
3.4.WordRecognitionandTransliteration
Inorder tomakethepalmleafmanuscriptsmoreaccessible, readable,andunderstandable toa
wideraudience,anoptical character recognition(OCR)systemshouldbedeveloped. InmanyDIA
systems,wordor text recognition is thefinal task in theprocessingpipeline.However,normally in
SoutheastAsianscript thespeechsoundof thesyllablechange is relatedtosomecertainphonological
rules. In thiscase,anOCRsystemisnotenough. Therefore,a transliterationsystemshouldalsobe
developedtohelp transliterate theancientscriptsonthesemanuscripts. Bydefinition, transliteration
is defined as the process of obtaining the phonetic translation of names across languages [54].
Transliteration involves renderinga language fromonewritingsystemtoanother. In [54], theproblem
isstatedformallyasasequencelabelingproblemfromonelanguagealphabet toanother. Itwillhelpus
to indexandtoquicklyandefficientlyaccess thecontentof themanuscripts. Inourpreviouswork[29],
acompleteschemeforsegmentation-basedglyphrecognitionandtransliterationspecific toBalinese
palmleafmanuscriptswasproposed. In thiswork,asegmentation-freemethodwillbeevaluatedto
recognizeandtransliterate thewords fromthreedifferentscriptsofapalmleafmanuscript.
RNN/LSTM-BasedMethods
Fromthe lastdecade, sequence-analysis-basedmethodsusingaRecurrentNeuralNetwork-Long
Short-TermMemory(RNN-LSTM)typeoflearningnetworkhavebeenverypopularamongresearchers
in text recognition. RNN-LSTM-basedmethodtogetherwithaConnectionistTemporalClassification
(CTC)worksasasegmentation-free learning-basedmethodtorecognize thesequenceofcharacters
in aword or textwithout any handcrafted feature extractionmethod. The raw image pixel can
be sent directly as the input to the learning network and there is no requirement to segment the
trainingdatasequence. RNNisbasicallyanextendedversionof thebasic feedforwardneuralnetwork.
In a RNN, the neurons in the hidden layer are connected to each other. RNNoffers very good
context-awareprocessing torecognizepatterns inasequenceor timeseries.OnedrawbackofRNNis
thevanishinggradientproblem. Todealwith thisproblem, theLSTMarchitecturewas introduced.
TheLSTMnetworkaddsmultiplicativegatesandadditive feedback. BidirectionalLSTMisanLSTM
111
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik