Page - 193 - in Document Image Processing
Image of the Page - 193 -
Text of the Page - 193 -
J. Imaging 2018,4, 32
ThenewdatasetAcTiV2.0 includes189videosequences, 4063key frames, 10,415 text images
and three video-stream resolutions, i.e., the new one is SD (480Ă 360). A brief comparison in
termsofcontentbetweenthe initialandnewversionof theproposeddataset ispresentedinTable2.
The architecture of the newdataset is completely different from the old one. In addition to the
videosandtheirannotationXMLďŹles,AcTiV2.0 includes twoappropriatedatasets fordetectionand
recognitiontasks, (seeFigure4).
Table2.StatisticsofAcTiV1.0andAcTiV2.0.
#Resolution #Videos #Frames #CroppedImages
AcTiV1.0 2 80 1843 -
AcTiV2.0 3 189 4063 10,415
R
D 1920 x 1080
AlJazeeraHD
909 France24
874 RussiaToday
882 TunisiaNat1
1099 TunisiaNat1+
299
AlJazeeraHD
2367
9958
57189 France24
2276
7084
40520 RussiaToday
2633
16543
96990 TunisiaNat1
2411
10998
64493 TunisiaNat1+
631
2635
15371
#Lines
#Words
#Characters
# Frames
TV Channel
Resolution 720 x 576 460 x 380
Resolution 1920 x 1080 720 x 576 460 x 380
TV
Channel189
Figure4.ArchitectureofAcTiV2.0andstatisticsof thedetection(D)andrecognition(R)datasets.
⢠AcTiV-D representsadatasetofnon-redundant framesusedtobuildandevaluatemethodsfor
detecting text regions inHD/SDframes.Atotalof4063 frameshavebeenhand-selectedwith
aparticular attention to achieve a highdiversity indepicted text regions. Figure 5provides
examples fromAcTiV-Dfor typicalproblemsinvideotextdetection. Totest thesystemsâability to
locate textsunderdifferentsituations, theproposeddataset includessomeframeswhichcontain
thesametextregionbutwithdifferentbackgroundsandsomeotherswithoutanytextcomponent.
⢠AcTiV-R is a dataset of textline images that canbeutilized to build and evaluateArabic text
recognition systems. Different fonts (more than 6), sizes, backgrounds, colors, contrasts and
occlusions are represented in thedataset. Figure 6 illustrates typical examples fromAcTiV-R.
Thecollected text images coverabroadrangeof characteristics thatdistinguishvideo frames
from scanned documents. AcTiV-R consists of 10,415 textline images, 44,583 words and
259,192characters. Tohaveaneasilyaccessible representationofArabic text, it is transformed
intoasetofLatin labelswithasufďŹx that refers to the letterâsposition in theword,_B:Begin,_M:
Middle;_E:End;and_I: Isolate.Anexample is showninFigure1.Duringtheannotationprocess,
wehaveconsidered164Arabiccharacter forms:
â 125 letters, i.e., taking intoaccount thisâpositioningâvariability;
â 15additionalcharacters, i.e., combinedwith thediacritic signâChaddaâ;
â 10digits; and
â 14punctuationmarks includingthewhite space.
Thedifferentcharacter labelscanbeobservedinTable3. Thesametablegives foreachcharacter
its frequency in thedataset.
Moredetailsabout thestatisticsof thedetectionandrecognitiondatasetsare inFigure4.
193
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik