Page - 192 - in Document Image Processing
Image of the Page - 192 -
Text of the Page - 192 -
J. Imaging 2018,4, 32
training/testsamplesandbestachievedresult.Asdepictedbythis table,publiclyavailabledatasets for
ArabicVideoOCRsystemsare limitedtooneworkfor therecognitiontaskandareevennon-existent
fordetectionandtrackingproblems. YousďŹetal. [44]put forwardadataset for superimposed text
recognition, calledAlif. Thedatasetwascomposedof6532staticcroppedtext imagesextractedfrom
diverseArabicTVchannelsandwithabout12%extractedfromwebsources. Thisdatasetofferedonly
one imageresolution.
Table1.Most importantexistingdatasets for textprocessing invideosandscene images. âDâ,âSâand
âRârespectivelydenoteâDetectionâ,âSegmentationâandâRecognitionâ.
Dataset
(Year) Category Source Task #ofImages
(Train/Test) #ofText
(Train/Test) Script Best
Scores
ICDARâ03 [18]
(2003) Scene text Camera D/R 509
(258/251) 2276
(1110/1156) English 93.1%(R)
KAIST[31]
(2010) Scene text Camera,
mobilephone D/S 3000 >5000 English,
Korean 88%(S)
SVT[28]
(2010) Scene text Google
StreetView D/S/R 350
(100/250) 904
(257/647) English 80.8%(R)
90%(S)
NEOCR[33]
(2011) Scene text Camera D/R 659 5238 Eight
languages
ICDARâ11 [22]
(2011) Scene text Camera D/R 485 1564 English 82%(D)
MSRA-TD500
[26] (2012) scene text Camera D 500
(300/200) _ English,
Chinese 75%
ICDARâ13 [24]
(2013) Scene text
ArtiďŹcial text
Videoscene Camera
Web
Camera D/S/R
D/S/R
D/T/R 229/233
410/141
28videos 848/1095
3564/1439
_ Spanish,
French,
English
ALIF[44]
(2015) ArtiďŹcial text Videoframes R 6532
(4152/2199) Arabic 55.03%
COCO-Text [34]
(2016) Scene text MSCOCO
dataset D/R 63,686
(43.6k/10k) 173,000 English 67.16%
(D)
Total-Text [36]
(2017) Curvedscene text web D/R 1555
(1255/300) 9330 (words) English
3. ProposedDatasets
In this section, we describe the AcTiV 2.0 dataset in terms of characteristics, statistics and
annotationguidelines.
3.1.DataCharacteristics andStatistics
Asmentioned in the introduction,AcTiV1.0 (http://tc11.cvc.uab.es/datasets/AcTiV_1)was
presented in the ICDARâ15conference [14]as theďŹrstpubliclyaccessibleannotateddatasetdesigned
toassess theperformanceofdifferentArabicVideoOCRsystems. Thisdatabase iscurrentlyusedby
several researchgroupsaroundtheworld. Itwaspartiallyusedasabenchmark in theďŹrsteditionof
theâAcTiVCompâcontest inconjunctionwith the ICPRâ16conference [45]. The twomainchallenges
addressed by this dataset are text pattern variability andpresence of complex backgroundswith
various text-likeobjects.AcTiV1.0consistsof80videoclipsrecordedfromfourdifferentArabicnews
channels: TunisiaNat1,France24,RussiaTodayandAljazeeraHD.AcTiV1.0 iscomposedofvideoclips
andtheircorrespondingXMLďŹles (detailed inSection3.2).Weselectedfromthesevideoclips1843
framesdedicatedto thedetectiontask. In [14,46], theďŹrst resultsusingAcTiV1.0werepresented.
Basedontheobtainedresultsunderdifferentevaluationprotocolsandconsidering theAcTiV1.0
usersâ feed-backs, itwasnecessarytoextendthecontent intermsofvideoclipsandresolutionsoffering
more trainingsamples, especially fordeeplearning-basedmethods.
192
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik