Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 91 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 91 - in Document Image Processing

Image of the Page - 91 -

Image of the Page - 91 - in Document Image Processing

Text of the Page - 91 -

J. Imaging 2018,4, 41 foundbyvarious functions. Thesumofsquares functionusedtocalculate the lossorerror thatcanbe expressedas j(w)= N ∑ n=1 (yn− yˆn)2+λ L ∑ l=1 W2l (3) AnL2regularizationλwasappliedduringthecomputationof loss toavoid the largeprogressof theparametersat the timeof theminimizationprocess. TheentirenetworkofDCNNinvolves themultiple layersofconvolutional,pooling, relu, fully connectedandSoftmax. These layershaveadifferent specification toexpress theminaparticular network. In thispaper,weusedaspecial conventiontoexpress thenetworkofDCNN. • xINy:Aninput layerwherexrepresents thewidthandheightof the imageandyrepresent the numberofchannels. • xCy:Aconvolutional layerwherexrepresentsanumberofkernelsandyrepresents thesizeof kernely*y. • xPy:Apooling layerwherexrepresentspoolingsizex*x,andyrepresentspoolingstride. • Relu:Represents rectifiedlayerunit. • xDrop:Adropout layerwherexrepresents theprobabilityvalue. • xFC:Afullyconnectedordense layerwherexrepresentsanumberofneurons. • xOU:Aoutput layerwherexrepresentsclassesor labels. 3.2.DifferentAdaptiveGradientMethods Basically, theneuralnetwork trainingupdates theweights ineach iteration, andthefinalgoal of training is tofindtheperfectweight thatgives theminimumlossorerror. Oneof the important parameters of thedeepneural network is learning rate,whichdecides the change in theweights. Theselectionofvalueforlearningrateisaverychallengingtaskbecauseif thevalueofthelearningrate selects low, thentheoptimizationcanbeveryslowandanetworkwill take timetoreachtheminimum lossorerror.Ontheotherhand, if thevalueof learningrateselectshigher, thentheoptimizationcan deviate and thenetworkwill not reach theminimumlossor error. Thisproblemcanbe solvedby theadaptivegradientmethods thathelp in faster trainingandbetterconvergence. TheAdagrad[27] (adaptivegradient) algorithmwas introducedbyDuchi in2011. It automatically incorporates low andhigh update for frequent and infrequent occurring features respectively. Thismethod gives an improvement inconvergenceperformanceascomparedtostandardstochasticgradientdescent for thesparsedata. It canbeexpressedas, Wt+1=Wt− α√ ∑tAvt2+ gt (4) whereAvt is thepreviousadjustmentgradientand isusedtoavoiddividebyzeroproblems. TheAdagradmethoddivides the learningratebythesumof thesquaredgradient thatproduces asmall learningrate. ThisproblemissolvedbytheAdadeltamethod[28] thatcanonlyaccumulate a fewpastgradients in spiteof entirepastgradients. Theequationof theAdadeltamethodcanbe expressedas Wt+1=Wt− α√ E[Av]2+ gt (5) whereE[Av]2 representsentirepastgradients. Itdependsoncurrentgradientandthepreviousaverage of thegradient. TheproblemofAdagradissolvedbyHinton[29]bythe techniquecalledRMSProp, whichwasdesignedforstochasticgradientdescent. RMSPropisanupdatedversionofRpropwhich didnotworkwithmini-batches. Rprop is sameas the gradient, but it alsodivides by the size of thegradient. RMSPropkeepsamovingaverageof thesquaredgradient foreachweightand, further, 91
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing