Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 135 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 135 - in Document Image Processing

Bild der Seite - 135 -

Bild der Seite - 135 - in Document Image Processing

Text der Seite - 135 -

J. Imaging 2018,4, 15 partition. TheWeightedFiniteStateTransducers (WFST)decoding(seeSection3.5) canbedesignedto outputword, sub-wordorcharacter sequences. Foreachoutput type, the lexiconandlanguagemodel have tobemodifiedaccordingly,andnoadditionalmodification isnecessary in thesystem. Preprocessing and feature extraction RecurrentNeuralNetwork x WordLexiconand LanguageModel WFSTdecoding o muerteepeormeresciaelporquantopassaraelmandami x1 x2 . . . x60 BLSTMlayer o1 . . . o106 Figure5. Bi-directionalLong-ShortTermMemory (BLSTM)systemarchitecture. TheBLSTMRNN outputsposteriordistributionsoateachtimestep. Thedecoding isperformedwithWeightedFinite StateTransducers (WFST)usinga lexiconanda languagemodelatwordlevel. 3.4.3.DeepModelsBasedonConvolutionalRecurrentNeuralNetworks The Convolutional Recurrent Neural Network (CRNN) [32] is inspired by the VGG16 architecture [33] that was developed for image recognition. We use a stack of 13 convolutional (3× 3filters, 1× 1 stride) layers followedby three bi-directional LSTMlayerswith 256units per layer (seeFigure6). EachLSTMunithasonecellwithenabledpeepholeconnections. Spatialpooling (max) isemployedafter someconvolutional layers. To introducenon-linearity, theRectifiedLinear Unit (ReLU) activation functionwas used after each convolution. It has the advantage of being resistant to the vanishinggradient problemwhile being simple in termsof computation andwas showntoworkbetter thansigmoidandhyperbolic tangentactivation functions [34].Asquare-shaped slidingwindow is used to scan the text-line image in the direction of thewriting. The height of thewindow is equal to theheight of the text-line image,whichhas beennormalized to 64pixels. Thewindowoverlap isequal to twopixels toallowcontinuous transitionof theconvolutionfilters. Foreachanalysiswindowof64×64pixels insize,16 featurevectorsareextractedfromthefeature mapsproducedby the last convolutional layerand fed into theobservationsequence. For eachof the16columnsof the last512 featuremaps, thecolumnsofaheightof twopixelsareconcatenated intoa featurevectorof size1024 (512×2). Thanks to theCTCtranscription layer [35], thesystemis end-to-endtrainable. TheconvolutionalfiltersandtheLSTMunitsweightsare thus jointly learned usingtheback-propagationprocedure.Wecombinedtheforwardandbackwardoutputsat theend of the BLSTMstack [36] rather than after each BLSTM layer, in order to decrease the number of parameters.Wealsochosenot toaddadditional fully-connected layerssince,byaddingsuchlayers, thenetworkhadmoreparameters, convergedmoreslowlyandperformedworse.Hyperparameters suchas thenumberofconvolution layersandthenumberofBLSTMlayersweresetuponavalidation set. TheLSTMunitweightswere initializedasper themethodof [37],whichproved toworkwell andhelps thenetwork toconverge faster. Thisallows thenetwork tomaintainaconstantvariance across thenetwork layers,whichkeeps thesignal fromexplodingtoahighvalueorvanishingtozero. Theweightmatriceswere initializedwithauniformdistribution. 135
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing