Page - 90 - in Document Image Processing
Image of the Page - 90 -
Text of the Page - 90 -
J. Imaging 2018,4, 41
kernelvalues for themodel. Thealternativeconvolutionalandmax-poolinglayersdothis jobperfectly.
AnotherpartofDCNNisfullyconnectedlayerswhichcontainmultipleneurons, likethesimpleneural
network ineach layer thatgetsahigh-level feature fromthepreviousconvolutional-pooling layerand
computes theweights toclassify theobjectproperly.
2SWLPL]HU 5PVSURS $GDP 0D[
Figure1.Theschematicdiagramofdeepconvolutionalneuralnetwork(DCNN)architecture.
3.1.DCNNNotation
Thedeepconvolutional neural network is a speciallydesignedneural network for the image
processingwork. Themostof thecolor imagesarebeingrepresentedinthreedimensionshĂwĂc,
wherehrepresentsheight,wrepresents thewidthof theimageandcrepresents thenumberofchannels
of the image. However, theDCNNcanonly takean imagewhichhas the sameheight andwidth.
Sobefore feedingthe imageinDCNN,anormalizationprocesshas tofollowtoconvert the imagefrom
hĂwĂcsize tomĂmĂcsizewheremrepresentsheightandwidthofanimage. TheDCNNdirectly
takes the three-dimensionalnormalized image/matrixXasan inputandsupplies toconvolutional
layerwhichhaskkernelsof sizenĂnĂp,wheren<mand p⤠c. Theconvolutional layerperforms
themultiplicationbetweentheneighborsofaparticularelementofXwiththeweightsprovidedby
thekernel togenerate thekdifferent featuremapsofsize l(mân+1). Theconvolutional layer isoften
followedbytheactivationfunctions. RectiďŹedlinearunit (Relu)wasselectedasactivationfunction
Ykl = f (
n
â
i=1 XiâWkil+Bkl )
(1)
wherekdenotes the featuremaplayer,Y isamapofsize lĂ landWil isakernelweightof sizenĂn,
Bkl represents thebiasvalueand*represents the2Dconvolution.
Thenextpoolinglayerworkstoreducethefeaturemapsbyapplyingmean,maxorminoperation
overplĂpl localregionoffeaturemap,wherepl canvaryfrom2to5generally.DCNNshavemultiple
consecutivelayersofconvolutional followedbypoolinglayersandeachconvolutional layer introduces
a lotofunknownweight. Theback-propagationalgorithmâoneof the famous techniquesused in
thesimpleneuralnetworktoďŹndweightautomaticallyâhasbeenusedtoďŹndtheunknownweights
duringthe trainingphase. Theback-propagationupdates theweights tominimizea loss j(w)orerror
withan iterativeprocessofgradientdescent thatcanbeexpressedas
Wt+1=WtâÎąâE|j(Wt)|+Ονt (2)
Back-propagationalgorithmhelps to followadirection towardswhere thecost functiongives
theminimum loss or error by updating theweights. The value Îą, called learning rate, helps to
determine thestepsizeorchange in thepreviousweight. Theback-propagationcanbestuckat local
minimumsometimes,whichcanbeovercomebymomentumÎźwhichaccumulatesavelocityvector
ν in thedirection of continuous reduction of loss function. The error or loss of a network canbe
90
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik