Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 200 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 200 - in Document Image Processing

Image of the Page - 200 -

Image of the Page - 200 - in Document Image Processing

Text of the Page - 200 -

J. Imaging 2018,4, 32 textline formationmethod (see [46] formoredetails). ThesecondstageusesCAEtoautomatically produce features, insteadofhard-codingthem.These featureshavebeen learnedinanunsupervised wayfromthe textlinecandidatesobtained in thefirst stage. Then, todiscriminate textobjects from non-textones,anSVMclassifierwithRBFkernel is trainedonthepatchesextractedfromthetextline candidatesbyusingthegeneratedCAEfeatures. Note that thewholealgorithmisperformedtwice (foreach image) tohandlebothdark-on-light and light-on-dark texts, once along the gradient direction and once along the inverse direction. Theresultsof twopassesarecombinedtomakefinaldecisions. Figure12.Pipelineof thetextdetectionalgorithm.Twopassesareperformed,oneforeachtextpolarity (DarktextonLightbackgroundorLight textonDarkbackground). 5.2. SIDOCR TheSIDOCRsystem[51] reliesspecificallyonaMulti-DimensionalLongShortTermMemory (MDLSTM)withaCTCoutput layer. Theproposednetwork is composedof three levels: an input layer,fivehiddenlayersandanoutput layer. ThehiddenlayersareMDLSTMthat respectivelyhave 2,10,and50cellsandseparatedbyfeedforwardlayerswith6and20cells. In fact,wehavecreated ahierarchical structurebyrepeatedlycomposingMDLSTMlayerswith feedforward layers. Firstly, the image isdividedintosmallpatchesusingapixelwindowcalledthe“inputblock”,eachofwhich ispresentedto thefirstMDLSTMlayerasa featurevectorofpixel intensities. Thesevectorsare then scannedbyfourMDLSTMlayers indifferentdirections (i.e.,up,down, leftanright). After that, thecellsactivationof theMDLSTMlayersaresequentially fedto thefirstandsecond feed-forward layers through sub-samplewindows, namely “hidden block”. This can be seen as a subsampling stepwith trainableweights, inwhich the activationare summedandsquashedby thehyperbolic tangent (tanh) function. This step aims to extremely reduce thenumber ofweight connectionsbetweenhiddenlayers. Thefinal level is theCTCoutput layerwhich labels the sequencesof textlines. This layerhas ncells,wheren is thenumberof classes, inourcase165 (164charactersandonecell for the ‘blank’ output). Theoutputactivationsarenormalizedateachtimestepwith thesoftmaxactivationfunction. Theuse of such layer allowsworking onunsegmented input sequence,which is not the case for standardRNNobjective functions.AseparatenetworkhasbeentrainedforeachTVchannelof the referenceprotocol.All input imageshavebeenscaledtocommonheights (70pixels)andconverted togray-scale. Thetraining iscarriedoutwithback-propagationthroughtime(BPTT)algorithmand steepset optimizerhasbeenusedwitha learning rate of 10−4 andwithamomentumvalueof 0.9. Weperformedseveralexperimentstofindtheoptimalsizesof theMDLSTMlayers, feedforwardlayers, inputblockandhiddenblock. Table6summarizes thebestobtainedvaluesof thenetworkparameters. 200
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing