Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 135 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 135 - in Document Image Processing

Image of the Page - 135 -

Image of the Page - 135 - in Document Image Processing

Text of the Page - 135 -

J. Imaging 2018,4, 15 partition. TheWeightedFiniteStateTransducers (WFST)decoding(seeSection3.5) canbedesignedto outputword, sub-wordorcharacter sequences. Foreachoutput type, the lexiconandlanguagemodel have tobemodifiedaccordingly,andnoadditionalmodification isnecessary in thesystem. Preprocessing and feature extraction RecurrentNeuralNetwork x WordLexiconand LanguageModel WFSTdecoding o muerteepeormeresciaelporquantopassaraelmandami x1 x2 . . . x60 BLSTMlayer o1 . . . o106 Figure5. Bi-directionalLong-ShortTermMemory (BLSTM)systemarchitecture. TheBLSTMRNN outputsposteriordistributionsoateachtimestep. Thedecoding isperformedwithWeightedFinite StateTransducers (WFST)usinga lexiconanda languagemodelatwordlevel. 3.4.3.DeepModelsBasedonConvolutionalRecurrentNeuralNetworks The Convolutional Recurrent Neural Network (CRNN) [32] is inspired by the VGG16 architecture [33] that was developed for image recognition. We use a stack of 13 convolutional (3× 3filters, 1× 1 stride) layers followedby three bi-directional LSTMlayerswith 256units per layer (seeFigure6). EachLSTMunithasonecellwithenabledpeepholeconnections. Spatialpooling (max) isemployedafter someconvolutional layers. To introducenon-linearity, theRectifiedLinear Unit (ReLU) activation functionwas used after each convolution. It has the advantage of being resistant to the vanishinggradient problemwhile being simple in termsof computation andwas showntoworkbetter thansigmoidandhyperbolic tangentactivation functions [34].Asquare-shaped slidingwindow is used to scan the text-line image in the direction of thewriting. The height of thewindow is equal to theheight of the text-line image,whichhas beennormalized to 64pixels. Thewindowoverlap isequal to twopixels toallowcontinuous transitionof theconvolutionfilters. Foreachanalysiswindowof64×64pixels insize,16 featurevectorsareextractedfromthefeature mapsproducedby the last convolutional layerand fed into theobservationsequence. For eachof the16columnsof the last512 featuremaps, thecolumnsofaheightof twopixelsareconcatenated intoa featurevectorof size1024 (512×2). Thanks to theCTCtranscription layer [35], thesystemis end-to-endtrainable. TheconvolutionalfiltersandtheLSTMunitsweightsare thus jointly learned usingtheback-propagationprocedure.Wecombinedtheforwardandbackwardoutputsat theend of the BLSTMstack [36] rather than after each BLSTM layer, in order to decrease the number of parameters.Wealsochosenot toaddadditional fully-connected layerssince,byaddingsuchlayers, thenetworkhadmoreparameters, convergedmoreslowlyandperformedworse.Hyperparameters suchas thenumberofconvolution layersandthenumberofBLSTMlayersweresetuponavalidation set. TheLSTMunitweightswere initializedasper themethodof [37],whichproved toworkwell andhelps thenetwork toconverge faster. Thisallows thenetwork tomaintainaconstantvariance across thenetwork layers,whichkeeps thesignal fromexplodingtoahighvalueorvanishingtozero. Theweightmatriceswere initializedwithauniformdistribution. 135
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing