Page - 192 - in Document Image Processing

Image of the Page - 192 -

Text of the Page - 192 -

J. Imaging 2018,4, 32 training/testsamplesandbestachievedresult.Asdepictedbythis table,publiclyavailabledatasets for ArabicVideoOCRsystemsare limitedtooneworkfor therecognitiontaskandareevennon-existent fordetectionandtrackingproblems. Yousﬁetal. [44]put forwardadataset for superimposed text recognition, calledAlif. Thedatasetwascomposedof6532staticcroppedtext imagesextractedfrom diverseArabicTVchannelsandwithabout12%extractedfromwebsources. Thisdatasetofferedonly one imageresolution. Table1.Most importantexistingdatasets for textprocessing invideosandscene images. “D”,“S”and “R”respectivelydenote“Detection”,“Segmentation”and“Recognition”. Dataset (Year) Category Source Task #ofImages (Train/Test) #ofText (Train/Test) Script Best Scores ICDAR’03 [18] (2003) Scene text Camera D/R 509 (258/251) 2276 (1110/1156) English 93.1%(R) KAIST[31] (2010) Scene text Camera, mobilephone D/S 3000 >5000 English, Korean 88%(S) SVT[28] (2010) Scene text Google StreetView D/S/R 350 (100/250) 904 (257/647) English 80.8%(R) 90%(S) NEOCR[33] (2011) Scene text Camera D/R 659 5238 Eight languages ICDAR’11 [22] (2011) Scene text Camera D/R 485 1564 English 82%(D) MSRA-TD500 [26] (2012) scene text Camera D 500 (300/200) _ English, Chinese 75% ICDAR’13 [24] (2013) Scene text Artiﬁcial text Videoscene Camera Web Camera D/S/R D/S/R D/T/R 229/233 410/141 28videos 848/1095 3564/1439 _ Spanish, French, English ALIF[44] (2015) Artiﬁcial text Videoframes R 6532 (4152/2199) Arabic 55.03% COCO-Text [34] (2016) Scene text MSCOCO dataset D/R 63,686 (43.6k/10k) 173,000 English 67.16% (D) Total-Text [36] (2017) Curvedscene text web D/R 1555 (1255/300) 9330 (words) English 3. ProposedDatasets In this section, we describe the AcTiV 2.0 dataset in terms of characteristics, statistics and annotationguidelines. 3.1.DataCharacteristics andStatistics Asmentioned in the introduction,AcTiV1.0 (http://tc11.cvc.uab.es/datasets/AcTiV_1)was presented in the ICDAR’15conference [14]as theﬁrstpubliclyaccessibleannotateddatasetdesigned toassess theperformanceofdifferentArabicVideoOCRsystems. Thisdatabase iscurrentlyusedby several researchgroupsaroundtheworld. Itwaspartiallyusedasabenchmark in theﬁrsteditionof the“AcTiVComp”contest inconjunctionwith the ICPR’16conference [45]. The twomainchallenges addressed by this dataset are text pattern variability andpresence of complex backgroundswith various text-likeobjects.AcTiV1.0consistsof80videoclipsrecordedfromfourdifferentArabicnews channels: TunisiaNat1,France24,RussiaTodayandAljazeeraHD.AcTiV1.0 iscomposedofvideoclips andtheircorrespondingXMLﬁles (detailed inSection3.2).Weselectedfromthesevideoclips1843 framesdedicatedto thedetectiontask. In [14,46], theﬁrst resultsusingAcTiV1.0werepresented. Basedontheobtainedresultsunderdifferentevaluationprotocolsandconsidering theAcTiV1.0 users’ feed-backs, itwasnecessarytoextendthecontent intermsofvideoclipsandresolutionsoffering more trainingsamples, especially fordeeplearning-basedmethods. 192

back to the book Document Image Processing"

Document Image Processing

Title: Document Image Processing
Authors: Ergina Kavallieratou; Laurence Likforman-Sulem
Editor: MDPI
Location: Basel
Date: 2018
Language: German
License: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Size: 17.0 x 24.4 cm
Pages: 216
Keywords: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category: Informatik