Seite - 90 - in Document Image Processing
Bild der Seite - 90 -
Text der Seite - 90 -
J. Imaging 2018,4, 41
kernelvalues for themodel. Thealternativeconvolutionalandmax-poolinglayersdothis jobperfectly.
AnotherpartofDCNNisfullyconnectedlayerswhichcontainmultipleneurons, likethesimpleneural
network ineach layer thatgetsahigh-level feature fromthepreviousconvolutional-pooling layerand
computes theweights toclassify theobjectproperly.
2SWLPL]HU 5PVSURS $GDP 0D[
Figure1.Theschematicdiagramofdeepconvolutionalneuralnetwork(DCNN)architecture.
3.1.DCNNNotation
Thedeepconvolutional neural network is a speciallydesignedneural network for the image
processingwork. Themostof thecolor imagesarebeingrepresentedinthreedimensionsh×w×c,
wherehrepresentsheight,wrepresents thewidthof theimageandcrepresents thenumberofchannels
of the image. However, theDCNNcanonly takean imagewhichhas the sameheight andwidth.
Sobefore feedingthe imageinDCNN,anormalizationprocesshas tofollowtoconvert the imagefrom
h×w×csize tom×m×csizewheremrepresentsheightandwidthofanimage. TheDCNNdirectly
takes the three-dimensionalnormalized image/matrixXasan inputandsupplies toconvolutional
layerwhichhaskkernelsof sizen×n×p,wheren<mand p≤ c. Theconvolutional layerperforms
themultiplicationbetweentheneighborsofaparticularelementofXwiththeweightsprovidedby
thekernel togenerate thekdifferent featuremapsofsize l(m−n+1). Theconvolutional layer isoften
followedbytheactivationfunctions. Rectifiedlinearunit (Relu)wasselectedasactivationfunction
Ykl = f (
n
∑
i=1 Xi∗Wkil+Bkl )
(1)
wherekdenotes the featuremaplayer,Y isamapofsize l× landWil isakernelweightof sizen×n,
Bkl represents thebiasvalueand*represents the2Dconvolution.
Thenextpoolinglayerworkstoreducethefeaturemapsbyapplyingmean,maxorminoperation
overpl×pl localregionoffeaturemap,wherepl canvaryfrom2to5generally.DCNNshavemultiple
consecutivelayersofconvolutional followedbypoolinglayersandeachconvolutional layer introduces
a lotofunknownweight. Theback-propagationalgorithm—oneof the famous techniquesused in
thesimpleneuralnetworktofindweightautomatically—hasbeenusedtofindtheunknownweights
duringthe trainingphase. Theback-propagationupdates theweights tominimizea loss j(w)orerror
withan iterativeprocessofgradientdescent thatcanbeexpressedas
Wt+1=Wt−α∇E|j(Wt)|+μνt (2)
Back-propagationalgorithmhelps to followadirection towardswhere thecost functiongives
theminimum loss or error by updating theweights. The value α, called learning rate, helps to
determine thestepsizeorchange in thepreviousweight. Theback-propagationcanbestuckat local
minimumsometimes,whichcanbeovercomebymomentumμwhichaccumulatesavelocityvector
ν in thedirection of continuous reduction of loss function. The error or loss of a network canbe
90
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik