Seite - 88 - in Document Image Processing
Bild der Seite - 88 -
Text der Seite - 88 -
J. Imaging 2018,4, 41
gradients) [5],SIFT(scale-invariant feature transform)[6,7],LBP(localbinarypattern) [8]andSURF
(speededuprobust features) [9]. Theseareprominent featureextractionmethods,whichhavebeen
experimentedformanyproblemslike imagerecognition,character recognition, facedetection,etc. and
thecorrespondingmodelsarecalledshallowlearningmodels,whicharestillpopular for thepattern
recognition. Featureextraction[10] isone typeofdimensionalityreductiontechniquethat represents
the importantpartsofa large image intoa featurevector. These featuresarehandcraftedandexplicitly
designedbytheresearchcommunity. Therobustnessandperformanceof these featuresdependon
theskill andtheknowledgeofeachresearcher. Thereare thecaseswheresomevital featuresmaybe
unseenbytheresearcherswhileextractingthe features fromthe imageandthismayresult inahigh
classificationerror.
Deep learning inverts theprocessofhandcraftinganddesigningfeatures foraparticularproblem
into an automatic process to compute the best features for that problem. A deep convolutional
neuralnetworkhasmultipleconvolutional layers toextract the featuresautomatically. Thefeatures
are extracted only once inmost of the shallow learningmodels, but in the case of deep learning
models,multipleconvolutional layershavebeenadoptedtoextractdiscriminatingfeaturesmultiple
times. This isoneof the reasons thatdeep learningmodelsaregenerally successful. TheLeNet [4]
isanexampleofdeepconvolutionalneuralnetworkforcharacter recognition.Recently,manyother
examplesofdeep learningmodels canbe listedsuchasAlexNet [3],ZFNet [11],VGGNet [12] and
spatial transformernetworks[13]. Thesemodelshavebeensuccessfullyappliedforimageclassification
andcharacterrecognition.Owingtotheirgreatsuccess,manyleadingcompanieshavealso introduced
deepmodels. GoogleCorporationhasmadeaGoogLeNethaving 22 layers of convolutional and
poolinglayersalternatively.Apartfromthismodel,Googlehasalsodevelopedanopensourcesoftware
librarynamedTensorflowtoconductdeep learningresearch.Microsoftalso introduceditsowndeep
convolutional neural network architecture namedResNet in 2015. ResNet has 152-layer network
architectureswhichmade a new record in detection, localization, and classification. Thismodel
introducedanewideaof residual learning thatmakes theoptimizationand theback-propagation
processeasier thanthebasicDCNNmodel.
Character recognition isafieldof imageprocessingwhere the image is recognizedandconverted
into amachine-readable format. As discussed above, the deep learning approach and especially
deep convolutional neural networks have been used for image detection and recognition. It has
also been successfully applied onRoman (MNIST) [4], Chinese [14], Bangla [15] andArabic [16]
languages. In thiswork,adeepconvolutionalneuralnetwork isappliedforhandwrittenDevanagari
characters recognition.
Themaincontributionsofourworkcanbesummarized in the followingpoints:
1. Thiswork is thefirst toapply thedeep learningapproachonthedatabasecreatedbyISI,Kolkata.
Themaincontribution isarigorousevaluationofvariousDCNNmodels.
2. Deep learning is a rapidly developing field, which is bringing new techniques that can
significantlyameliorate theperformanceofDCNNs. Since these techniqueshavebeenpublished
in the last fewyears, there isevenavalidationprocess forestablishingtheir cross-domainutility.
Weexploredtheroleofadaptivegradientmethods indeepconvolutionalneuralnetworkmodels,
andweshowedthevariation inrecognitionaccuracy.
3. TheproposedhandwrittenDevanagaricharacterrecognitionsystemachievesahighclassification
accuracy, surpassingexistingapproaches in literaturemainlyregardingrecognitionaccuracy.
4. A layer-wise technique of DCNN technique is proposed to achieve the highest recognition
accuracyandalsogeta fasterconvergencerate.
Theremainderofthispaperisorganizedasfollows. Section2discussespreviousworkinhandwritten
Devanagari character recognition, Section 3 presents the introduction of deep convolutional neural
networkandadaptivegradientmethods,Section4outlines theexperimentsanddiscussionsand, finally,
Section5concludesthepaper.
88
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik