Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 190 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 190 - in Document Image Processing

Image of the Page - 190 -

Image of the Page - 190 - in Document Image Processing

Text of the Page - 190 -

J. Imaging 2018,4, 32 2. LiteratureReview Recently, several approacheshavebeenproposed todetect and recognize texts invideos and natural scene images [1,2,15,16]. Allmentionedwork so far are dedicated to Latin or Chinese text detection and recognition methods. Much of the progress that has beenmade in this field of research is attributed to the availability of standarddatasets. Themost popular of these is thedataset of ICDAR2003Robust Reading Competitions (RRC) [17], prepared for scene text localization, character segmentation (removingbackgroundpixels) andwordrecognition. Thisdataset includes509 text images in real environmentscapturedwithhand-helddevices. 258imagesfromthedatabaseareusedfortrainingand theremaining251 imagesconstitute the test set. Someexamplesaredepicted inFigure2a. Thisdataset wasalsoused in the ICDAR2005TextLocatingCompetition [18]. Figure3 shows theevolutionof theLatin textdetectionresearchbetween2003and2013 [18–20] takingasabenchmark the ICDAR 2003dataset. As canbeobserved, themethodofHuanget al. [19] outperformsother approaches bya largemargin. Thismethodenhances theStrokeWidthTransform(SWT)algorithmusingcolor information and introduces Text CovarianceDescriptors (TCDs). For theword-recognition task, thebestaccuracyof93.1%,wasachievedbyJaderbergetal. [21]usingtheirproposedConvolutional Neural Networks (CNN) model. The dataset in ICDAR 2011 RRC [22] was inherited from the benchmark used in the previous ICDAR competitions (i.e., 2003 and 2005) but have undergone extensionandmodification, since thereare somemissingground truth informationand imprecise wordboundingboxes. Thefinaldatasetsconsistedof485 full imagesand1564croppedwordimages for localizationandword-recognition tasks, respectively.Onthisdataset, the textdetectionmethodof Liaoetal. [23]obtainsstate-of-the-artperformancewithanF-scoreof82%.Thisalgorithmisbasedon afullyconvolutionalnetwork(FCN)followedbyastandardnon-maximumsuppressionprocess. a b c d Figure2.Typical samples fromICDAR2003(a),MSRA-TD500(b),NEOCR(c) andKAIST(d)datasets. In the2013editionof ICDARRRC[24], anewdatabasewasproposedforvideotextdetection, trackingandrecognition. It contains28shortvideosequences.Anupdatedversionof thisdatasetwas providedinICDAR2015[25] includingatrainingsetof25videosandatest setof24videos. TheMSRA-TD500dataset[26]worksonmulti-orientedscenetextsdetection.Thisdatasetincludes 500 images (300for trainingand200for testing)withhorizontalandslant/skewedtexts incomplex natural scenes (seeFigure 2b for examples). ThemethodofLiuet al. [27] achieves state-of-the-art performanceonthisdatabasewithanF-scoreof75%.Thismethodmakesuseof theMaximallyStable ExtremalRegions (MSER) techniqueas textcandidatesextractoraswellasasetofheuristic rulesand anAdaBoostclassifierasa two-stagesfilteringprocess. The Street View Text (SVT) dataset [28] is used for scene text detection, segmentation and recognition inoutdoor images. It includes350full imageswith904word-levelannotatedbounding boxes. Themethod of Shi et al. [29] shows superiority over existing techniqueswith 80.8% as a recognitionaccuracy. Thismethod isbasedonConvolutionalRecurrentNeuralNetwork (CRNN), which integrates the advantages of both CNN and Recurrent Neural Networks (RNN). For the 190
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing