Page - 190 - in Document Image Processing

Image of the Page - 190 -

Text of the Page - 190 -

J. Imaging 2018,4, 32 2. LiteratureReview Recently, several approacheshavebeenproposed todetect and recognize texts invideos and natural scene images [1,2,15,16]. Allmentionedwork so far are dedicated to Latin or Chinese text detection and recognition methods. Much of the progress that has beenmade in this ﬁeld of research is attributed to the availability of standarddatasets. Themost popular of these is thedataset of ICDAR2003Robust Reading Competitions (RRC) [17], prepared for scene text localization, character segmentation (removingbackgroundpixels) andwordrecognition. Thisdataset includes509 text images in real environmentscapturedwithhand-helddevices. 258imagesfromthedatabaseareusedfortrainingand theremaining251 imagesconstitute the test set. Someexamplesaredepicted inFigure2a. Thisdataset wasalsoused in the ICDAR2005TextLocatingCompetition [18]. Figure3 shows theevolutionof theLatin textdetectionresearchbetween2003and2013 [18–20] takingasabenchmark the ICDAR 2003dataset. As canbeobserved, themethodofHuanget al. [19] outperformsother approaches bya largemargin. Thismethodenhances theStrokeWidthTransform(SWT)algorithmusingcolor information and introduces Text CovarianceDescriptors (TCDs). For theword-recognition task, thebestaccuracyof93.1%,wasachievedbyJaderbergetal. [21]usingtheirproposedConvolutional Neural Networks (CNN) model. The dataset in ICDAR 2011 RRC [22] was inherited from the benchmark used in the previous ICDAR competitions (i.e., 2003 and 2005) but have undergone extensionandmodiﬁcation, since thereare somemissingground truth informationand imprecise wordboundingboxes. Theﬁnaldatasetsconsistedof485 full imagesand1564croppedwordimages for localizationandword-recognition tasks, respectively.Onthisdataset, the textdetectionmethodof Liaoetal. [23]obtainsstate-of-the-artperformancewithanF-scoreof82%.Thisalgorithmisbasedon afullyconvolutionalnetwork(FCN)followedbyastandardnon-maximumsuppressionprocess. a b c d Figure2.Typical samples fromICDAR2003(a),MSRA-TD500(b),NEOCR(c) andKAIST(d)datasets. In the2013editionof ICDARRRC[24], anewdatabasewasproposedforvideotextdetection, trackingandrecognition. It contains28shortvideosequences.Anupdatedversionof thisdatasetwas providedinICDAR2015[25] includingatrainingsetof25videosandatest setof24videos. TheMSRA-TD500dataset[26]worksonmulti-orientedscenetextsdetection.Thisdatasetincludes 500 images (300for trainingand200for testing)withhorizontalandslant/skewedtexts incomplex natural scenes (seeFigure 2b for examples). ThemethodofLiuet al. [27] achieves state-of-the-art performanceonthisdatabasewithanF-scoreof75%.Thismethodmakesuseof theMaximallyStable ExtremalRegions (MSER) techniqueas textcandidatesextractoraswellasasetofheuristic rulesand anAdaBoostclassiﬁerasa two-stagesﬁlteringprocess. The Street View Text (SVT) dataset [28] is used for scene text detection, segmentation and recognition inoutdoor images. It includes350full imageswith904word-levelannotatedbounding boxes. Themethod of Shi et al. [29] shows superiority over existing techniqueswith 80.8% as a recognitionaccuracy. Thismethod isbasedonConvolutionalRecurrentNeuralNetwork (CRNN), which integrates the advantages of both CNN and Recurrent Neural Networks (RNN). For the 190

back to the book Document Image Processing"

Document Image Processing

Title: Document Image Processing
Authors: Ergina Kavallieratou; Laurence Likforman-Sulem
Editor: MDPI
Location: Basel
Date: 2018
Language: German
License: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Size: 17.0 x 24.4 cm
Pages: 216
Keywords: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category: Informatik