Page - 189 - in Document Image Processing
Image of the Page - 189 -
Text of the Page - 189 -
J. Imaging 2018,4, 32
identification [6]. AVideoOCRsystem is generally composedof four stages: detection, tracking,
extractionandrecognition. The twofirst steps consist in locating text regions invideo framesand
generating theboundingboxesof text linesasanoutput. Textextractionaimsatextracting textpixels
andremovingbackgroundones. Therecognitiontaskconverts imageregions into text strings. In this
work,wefocusespeciallyonthedetectionandrecognitionsteps.
Figure1.ExampleofanArabicvideoframeincludingsceneandartificial texts (a).Decompositionof
anArabicwordintocharacters (b).
Compared to scanned documents, text detection and recognition in video frames is more
challenging. Themajorchallengesare:
• Text patterns variability: unknown font-size and font-family, different colors and alignment
(even in thesameTVchannel).
• Backgroundcomplexity: text-likeobjects invideoframes, suchas fences,bricksandsigns, canbe
confusedwith textcharacters.
• Videoquality: acquisitionconditions, compressionartifactsandlowresolution.
All thesechallengesmaygiverise to failures invideotextdetection. Thepresentstudyfocuses
on theArabic videoOCRproblem. This introducesmanyadditional challenges related toArabic
script [7]. ComparedtoLatin, theArabic texthasspecial characteristics suchaspresenceofdiacritics,
non-uniforminter/intra-worddistanceandcursivenessof thescript, i.e., charactersmayhaveupto
fourshapesdependingontheirposition in theword(forexamples, seeFigure1b).
Several techniques have been proposed in the conventional field ofArabicOCR in scanned
documents [7–10]. However, fewattemptshavebeenmadeon thedevelopment ofdetection and
recognition systems for overlaid text inArabicnewsvideo [11–13]. These systemswere testedon
privatedatasetswithdifferent evaluationprotocols andmetrics thatmakedirect comparisonand
objective benchmarking rather impractical. For instance, in [11], the proposed text detectorwas
evaluatedonaprivate set of 150video images. In [13], Yousfiet al. evaluated their textdetection
systemontwoprivate test setsof164and201videoframes. Therefore, theavailabilityofanannotated
andpublicdataset isofkey importance for theArabicvideotextanalysiscommunity.
In this paper, we present AcTiV 2.0 as an open Arabic-Text-in-Video dataset dedicated to
benchmarkingandcomparisonofsystemsforArabictextdetection,trackingandrecognition.AcTiV2.0
isan importantextensionof theonepublishedinICDAR2015[14]. It includes189videoclipswith
anaverage lengthof10minpersequence foraglobaldurationofabout31h. Thesevideosequences
havebeencollected fromfourdifferentArabicnewschannelsduring theperiodbetweenOctober
2013andMarch2016. In thepresentwork, threevideoresolutionswerechosen:HD(HighDefinition,
1920×1080),SD(StandardDefinition,720×576)andSD(480×360). The latter resolutionconcerns
videoclips thathavebeendownloadedfromtheofficialYouTubechannelofTunisiaNat1TV.
Thepaper is organizedas follows: In Section 2,wepresent relatedworkondatasets for text
detection/recognitionproblems. Then,wepresent in Section 3 theAcTiV 2.0 dataset in terms of
features, statisticsandannotations.Wedetail theevaluationprotocols inSection4andpresent the
experimental results inSection5. InSection6,wedrawtheconclusionsanddiscuss futurework.
189
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik