Page - 94 - in Document Image Processing
Image of the Page - 94 -
Text of the Page - 94 -
J. Imaging 2018,4, 41
Table1.Variousnetworkarchitecturesofdeepconvolutionalneuralnetworkused.
Network ModelArchitectures
NA-1 64IN64-64C2-Relu-4P2-500FC-47OU
NA-2 64IN64-64C2-Relu-4P2-1000FC-47OU
NA-3 64IN64-32C2-Relu-4P2-32C2-Relu-4P2-1000FC-47OU
NA-4 64IN64-64C2-Relu-4P2-64C2-Relu-4P2-1000FC-47OU
NA-5 64IN64-32C2-Relu-4P2-32C2-Relu-4P2-32C2-Relu-4P2-1000FC-47OU
NA-6 64IN64-64C2-Relu-4P2-64C2-Relu-4P2-64C2-Relu-4P2-1000FC-47OU
The experiments were all executed on the ParamShavak supercomputer system having
twomulticoreCPUswitheachCPUconsistingof12coresalongwithtwoacceleratorcards. Thissystem
has64GBRAMwithCentOs6.5operatingsystem. Thedeepneuralnetworkmodelwascoded in
PythonusingKeras—ahigh-levelneural networkAPI thatusesTheanoPython library. Thebasic
pre-processingtasks likebackgroundelimination,gray-normalizationandimageresizingweredone
inMatlab. ISIDCHARandV2DMDCHARdatabases.
The ISIDCHAR [26]was prepared by researchers of the Indian Statistical Institute, Kolkata.
They collected the samples frompersons of different age groups to accommodate themaximum
variationofwritten characters. Apart from that, the samples are also collected from thefilled job
formsandpost-cards thatmakes thisdatabasesorealistic. Thisdatabaseconsistsof36,172grayscale
imagesof47differentDevanagari characters.Owingto theassemblageofsamples frommanyauthors,
thisdatabasedeliversavarietyof samples ineachclass, andthebackgroundof thesamples isalso
highlyuninformed.V2DMDCHAR[31]hasbeenpreparedbyVikas J.DongreandVijayH.Mankar’s
in2012. Thisdatabasehas20,305samplesofhandwrittenDevanagari characters.
4.1. ExperimentalSetup
Theexperimentswereperformed to investigate the effects of different networkarchitectures,
optimizers,andlayer-wise trainings. Thefirstphaseofexperimentswasperformedtoobserve thebest
networkarchitecture for thedatabase, and then thebest-observednetworkarchitecturewas tested
withsixdifferentoptimizers tofindthebestoptimizer.Atotalof12 (6+6)differentexperimentswere
performedonthedatabase. Thesecondphaseofexperimentsaimedtoobserve theeffectof layer-wise
training. The layer-wise trainingwasonlyperformedwith thebestnetworkarchitecture andbest
optimizerselected in thefirstphase.
Eachoptimizerhaditsownsetofparameters. Inourexperiments, theoptimizerparameterswere
keptasper theirdefaultvaluesorassuggestedbytheauthor. Therectifiedlinearactivationfunction
wasusedforentireexperiments tomitigate thegradientvanishingproblem.Thesumofsquaresof
thedifferencebetween target andobservedvalueswas calculated to estimate the loss of thedeep
network. Eachnetworkwas trainedfor100epochsusingmini-batchesofsize200.
4.2. Results
ThefirstphaseofexperimentswasperformedonISIDCHARtoexaminethebestdeepnetwork
architecture.WerecordedtherecognitionaccuracyatdifferentnetworkarchitectureusingtheAdam
optimizer during each of the 50 epochs. The results in terms of themaximum,minimum,mean,
andstandarddeviationvaluesof recognitionaccuracyarereported inTable2.
Thebest recognitionaccuracywasobtainedwith thenetworkarchitectureNA-6,andthe least
recognitionaccuracywasobtainedwith thenetworkarchitectureNA-1. Figure3showstheobtained
recognitionaccuracyateachepoch. ThenetworkNA-1produced85%recognitionaccuracybecause
it has only one convolutional layer. The networkNA-3 andNA-5 produced higher recognition
accuraciesof91.53%and93.24%respectivelybecause thesenetworkshaveamoreconvolutional layer.
Thisenhancementsignifies that the incrementof theconvolutional layer indeepconvolutionalneural
networkproducedbest results. Inourexperiments,weobservedtheenhancement in therecognition
94
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik