Page - 91 - in Document Image Processing
Image of the Page - 91 -
Text of the Page - 91 -
J. Imaging 2018,4, 41
foundbyvarious functions. Thesumofsquares functionusedtocalculate the lossorerror thatcanbe
expressedas
j(w)= N
â
n=1 (ynâ yËn)2+λ L
â
l=1 W2l (3)
AnL2regularizationλwasappliedduringthecomputationof loss toavoid the largeprogressof
theparametersat the timeof theminimizationprocess.
TheentirenetworkofDCNNinvolves themultiple layersofconvolutional,pooling, relu, fully
connectedandSoftmax. These layershaveadifferent speciï¬cation toexpress theminaparticular
network. In thispaper,weusedaspecial conventiontoexpress thenetworkofDCNN.
⢠xINy:Aninput layerwherexrepresents thewidthandheightof the imageandyrepresent the
numberofchannels.
⢠xCy:Aconvolutional layerwherexrepresentsanumberofkernelsandyrepresents thesizeof
kernely*y.
⢠xPy:Apooling layerwherexrepresentspoolingsizex*x,andyrepresentspoolingstride.
⢠Relu:Represents rectiï¬edlayerunit.
⢠xDrop:Adropout layerwherexrepresents theprobabilityvalue.
⢠xFC:Afullyconnectedordense layerwherexrepresentsanumberofneurons.
⢠xOU:Aoutput layerwherexrepresentsclassesor labels.
3.2.DifferentAdaptiveGradientMethods
Basically, theneuralnetwork trainingupdates theweights ineach iteration, andtheï¬nalgoal
of training is toï¬ndtheperfectweight thatgives theminimumlossorerror. Oneof the important
parameters of thedeepneural network is learning rate,whichdecides the change in theweights.
Theselectionofvalueforlearningrateisaverychallengingtaskbecauseif thevalueofthelearningrate
selects low, thentheoptimizationcanbeveryslowandanetworkwill take timetoreachtheminimum
lossorerror.Ontheotherhand, if thevalueof learningrateselectshigher, thentheoptimizationcan
deviate and thenetworkwill not reach theminimumlossor error. Thisproblemcanbe solvedby
theadaptivegradientmethods thathelp in faster trainingandbetterconvergence. TheAdagrad[27]
(adaptivegradient) algorithmwas introducedbyDuchi in2011. It automatically incorporates low
andhigh update for frequent and infrequent occurring features respectively. Thismethod gives
an improvement inconvergenceperformanceascomparedtostandardstochasticgradientdescent for
thesparsedata. It canbeexpressedas,
Wt+1=Wtâ αâ
âtAvt2+ gt (4)
whereAvt is thepreviousadjustmentgradientand isusedtoavoiddividebyzeroproblems.
TheAdagradmethoddivides the learningratebythesumof thesquaredgradient thatproduces
asmall learningrate. ThisproblemissolvedbytheAdadeltamethod[28] thatcanonlyaccumulate
a fewpastgradients in spiteof entirepastgradients. Theequationof theAdadeltamethodcanbe
expressedas
Wt+1=Wtâ αâ
E[Av]2+ gt (5)
whereE[Av]2 representsentirepastgradients. Itdependsoncurrentgradientandthepreviousaverage
of thegradient. TheproblemofAdagradissolvedbyHinton[29]bythe techniquecalledRMSProp,
whichwasdesignedforstochasticgradientdescent. RMSPropisanupdatedversionofRpropwhich
didnotworkwithmini-batches. Rprop is sameas the gradient, but it alsodivides by the size of
thegradient. RMSPropkeepsamovingaverageof thesquaredgradient foreachweightand, further,
91
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik