Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 200 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 200 - in Document Image Processing

Bild der Seite - 200 -

Bild der Seite - 200 - in Document Image Processing

Text der Seite - 200 -

J. Imaging 2018,4, 32 textline formationmethod (see [46] formoredetails). ThesecondstageusesCAEtoautomatically produce features, insteadofhard-codingthem.These featureshavebeen learnedinanunsupervised wayfromthe textlinecandidatesobtained in thefirst stage. Then, todiscriminate textobjects from non-textones,anSVMclassifierwithRBFkernel is trainedonthepatchesextractedfromthetextline candidatesbyusingthegeneratedCAEfeatures. Note that thewholealgorithmisperformedtwice (foreach image) tohandlebothdark-on-light and light-on-dark texts, once along the gradient direction and once along the inverse direction. Theresultsof twopassesarecombinedtomakefinaldecisions. Figure12.Pipelineof thetextdetectionalgorithm.Twopassesareperformed,oneforeachtextpolarity (DarktextonLightbackgroundorLight textonDarkbackground). 5.2. SIDOCR TheSIDOCRsystem[51] reliesspecificallyonaMulti-DimensionalLongShortTermMemory (MDLSTM)withaCTCoutput layer. Theproposednetwork is composedof three levels: an input layer,fivehiddenlayersandanoutput layer. ThehiddenlayersareMDLSTMthat respectivelyhave 2,10,and50cellsandseparatedbyfeedforwardlayerswith6and20cells. In fact,wehavecreated ahierarchical structurebyrepeatedlycomposingMDLSTMlayerswith feedforward layers. Firstly, the image isdividedintosmallpatchesusingapixelwindowcalledthe“inputblock”,eachofwhich ispresentedto thefirstMDLSTMlayerasa featurevectorofpixel intensities. Thesevectorsare then scannedbyfourMDLSTMlayers indifferentdirections (i.e.,up,down, leftanright). After that, thecellsactivationof theMDLSTMlayersaresequentially fedto thefirstandsecond feed-forward layers through sub-samplewindows, namely “hidden block”. This can be seen as a subsampling stepwith trainableweights, inwhich the activationare summedandsquashedby thehyperbolic tangent (tanh) function. This step aims to extremely reduce thenumber ofweight connectionsbetweenhiddenlayers. Thefinal level is theCTCoutput layerwhich labels the sequencesof textlines. This layerhas ncells,wheren is thenumberof classes, inourcase165 (164charactersandonecell for the ‘blank’ output). Theoutputactivationsarenormalizedateachtimestepwith thesoftmaxactivationfunction. Theuse of such layer allowsworking onunsegmented input sequence,which is not the case for standardRNNobjective functions.AseparatenetworkhasbeentrainedforeachTVchannelof the referenceprotocol.All input imageshavebeenscaledtocommonheights (70pixels)andconverted togray-scale. Thetraining iscarriedoutwithback-propagationthroughtime(BPTT)algorithmand steepset optimizerhasbeenusedwitha learning rate of 10−4 andwithamomentumvalueof 0.9. Weperformedseveralexperimentstofindtheoptimalsizesof theMDLSTMlayers, feedforwardlayers, inputblockandhiddenblock. Table6summarizes thebestobtainedvaluesof thenetworkparameters. 200
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing