Seite - 91 - in Document Image Processing
Bild der Seite - 91 -
Text der Seite - 91 -
J. Imaging 2018,4, 41
foundbyvarious functions. Thesumofsquares functionusedtocalculate the lossorerror thatcanbe
expressedas
j(w)= N
∑
n=1 (yn− yˆn)2+λ L
∑
l=1 W2l (3)
AnL2regularizationλwasappliedduringthecomputationof loss toavoid the largeprogressof
theparametersat the timeof theminimizationprocess.
TheentirenetworkofDCNNinvolves themultiple layersofconvolutional,pooling, relu, fully
connectedandSoftmax. These layershaveadifferent specification toexpress theminaparticular
network. In thispaper,weusedaspecial conventiontoexpress thenetworkofDCNN.
• xINy:Aninput layerwherexrepresents thewidthandheightof the imageandyrepresent the
numberofchannels.
• xCy:Aconvolutional layerwherexrepresentsanumberofkernelsandyrepresents thesizeof
kernely*y.
• xPy:Apooling layerwherexrepresentspoolingsizex*x,andyrepresentspoolingstride.
• Relu:Represents rectifiedlayerunit.
• xDrop:Adropout layerwherexrepresents theprobabilityvalue.
• xFC:Afullyconnectedordense layerwherexrepresentsanumberofneurons.
• xOU:Aoutput layerwherexrepresentsclassesor labels.
3.2.DifferentAdaptiveGradientMethods
Basically, theneuralnetwork trainingupdates theweights ineach iteration, andthefinalgoal
of training is tofindtheperfectweight thatgives theminimumlossorerror. Oneof the important
parameters of thedeepneural network is learning rate,whichdecides the change in theweights.
Theselectionofvalueforlearningrateisaverychallengingtaskbecauseif thevalueofthelearningrate
selects low, thentheoptimizationcanbeveryslowandanetworkwill take timetoreachtheminimum
lossorerror.Ontheotherhand, if thevalueof learningrateselectshigher, thentheoptimizationcan
deviate and thenetworkwill not reach theminimumlossor error. Thisproblemcanbe solvedby
theadaptivegradientmethods thathelp in faster trainingandbetterconvergence. TheAdagrad[27]
(adaptivegradient) algorithmwas introducedbyDuchi in2011. It automatically incorporates low
andhigh update for frequent and infrequent occurring features respectively. Thismethod gives
an improvement inconvergenceperformanceascomparedtostandardstochasticgradientdescent for
thesparsedata. It canbeexpressedas,
Wt+1=Wt− α√
∑tAvt2+ gt (4)
whereAvt is thepreviousadjustmentgradientand isusedtoavoiddividebyzeroproblems.
TheAdagradmethoddivides the learningratebythesumof thesquaredgradient thatproduces
asmall learningrate. ThisproblemissolvedbytheAdadeltamethod[28] thatcanonlyaccumulate
a fewpastgradients in spiteof entirepastgradients. Theequationof theAdadeltamethodcanbe
expressedas
Wt+1=Wt− α√
E[Av]2+ gt (5)
whereE[Av]2 representsentirepastgradients. Itdependsoncurrentgradientandthepreviousaverage
of thegradient. TheproblemofAdagradissolvedbyHinton[29]bythe techniquecalledRMSProp,
whichwasdesignedforstochasticgradientdescent. RMSPropisanupdatedversionofRpropwhich
didnotworkwithmini-batches. Rprop is sameas the gradient, but it alsodivides by the size of
thegradient. RMSPropkeepsamovingaverageof thesquaredgradient foreachweightand, further,
91
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik