Page - 188 - in Document Image Processing
Image of the Page - 188 -
Text of the Page - 188 -
Journal of
Imaging
Article
OpenDatasetsandToolsforArabicTextDetection
andRecognitioninNewsVideoFrames
OussamaZayene1,2,*,SamehMasmoudiTouj1, JeanHennebert 3,Rolf Ingold2
andNajouaEssoukriBenAmara1
1 LATISLab,NationalEngineeringSchoolofSousse (Eniso),UniversityofSousse,Sousse4054,Tunisia;
samehmasmouditouj@yahoo.fr (S.M.T.);najoua.benamara@eniso.rnu.tn (N.E.B.A.)
2 DIVAGroup,Departmentof Informatics,UniversityofFribourg(Unifr),Fribourg1700,Switzerland;
rolf.ingold@unifr.ch
3 ICoSys Institute,HES-SO,UniversityofAppliedSciences,Fribourg1705,Switzerland;
jean.hennebert@hefr.ch
* Correspondence: oussama.zayene@unifr.ch
Received: 26November2017;Accepted: 26 January2018;Published: 31 January2018
Abstract:Recognizingtexts invideo ismorecomplexthaninotherenvironmentssuchasscanned
documents. Video texts appear in various colors, unknown fonts and sizes, often affected by
compression artifacts and lowquality. In contrast to Latin texts, there are nopublicly available
datasetswhich cover all aspects of theArabicVideoOCRdomain. This paper describes a new
well-definedandannotatedArabic-Text-in-VideodatasetcalledAcTiV2.0. Thedataset isdedicated
especially tobuildingandevaluatingArabicvideotextdetectionandrecognitionsystems.AcTiV2.0
contains 189video clips serving as a rawmaterial for creating 4063key frames for thedetection
task and 10,415 cropped text images for the recognition task. AcTiV 2.0 is alsodistributedwith
its annotationandevaluation tools thataremadeopen-source for standardizationandvalidation
purposes. Thispaperalsoreportsontheevaluationofseveral systemstestedunder theproposed
detectionandrecognitionprotocols.
Keywords:videotextdetection;videotext recognition;AcTiVdataset;ArabicVideoOCR
1. Introduction
Broadcastnewsandpublic-affairsprogramsareaprominentsourceof informationthatprovides
dailyupdatesonnationalandworldnews.Nowadays,TVnewscastersarchivea tremendousnumber
ofnewsvideoclips thanks to therapidprogress inmassstorage technology.As thearchivesizegrows
rapidly, themanualannotationofallvideoclipsbecomes impractical.
Sincethe80s,researchinOCRtechniqueshasbeenanattractivefieldinthedocumentanalysisand
recognitioncommunity. Priorworkhasaddressedspecificresearchproblemsthathaveborderedon
printedandhandwritten texts inscanneddocuments. Recently, embeddedtext invideoshasreceived
increasingattentionas itoftengivescrucial informationabout themediacontent [1–3].Newsvideos
generally contain two types of texts [2]: scene text and artificial text (Figure 1). The first type is
naturally recordedaspartof sceneduringvideocapturing, suchas trafficandshopsigns. Thesecond
typeof text isartificiallysuperimposedonthevideoduringtheeditingprocess.Comparedwithscene
text, theartificialoneusuallyprovidesbriefanddirectdescriptionofvideocontent,whichis important
forautomaticbroadcastannotation. Typically,artificial text innewsvideo indicatesspeaker’sname,
location,event information, scoresofamatch,etc. Therefore, in thiscontext,weparticularly focuson
thiscategoryof text.
Recognizingtext invideos,oftencalledVideoOCR[4], isanessential task inmanyapplications
suchasnews indexingandretrieval [5], videocategorization, largearchivemanagingandspeaker
J. Imaging 2018,4, 32 188 www.mdpi.com/journal/jimaging
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik