Page - 193 - in Document Image Processing

Image of the Page - 193 -

Text of the Page - 193 -

J. Imaging 2018,4, 32 ThenewdatasetAcTiV2.0 includes189videosequences, 4063key frames, 10,415 text images and three video-stream resolutions, i.e., the new one is SD (480× 360). A brief comparison in termsofcontentbetweenthe initialandnewversionof theproposeddataset ispresentedinTable2. The architecture of the newdataset is completely different from the old one. In addition to the videosandtheirannotationXMLﬁles,AcTiV2.0 includes twoappropriatedatasets fordetectionand recognitiontasks, (seeFigure4). Table2.StatisticsofAcTiV1.0andAcTiV2.0. #Resolution #Videos #Frames #CroppedImages AcTiV1.0 2 80 1843 - AcTiV2.0 3 189 4063 10,415 R D 1920 x 1080 AlJazeeraHD 909 France24 874 RussiaToday 882 TunisiaNat1 1099 TunisiaNat1+ 299 AlJazeeraHD 2367 9958 57189 France24 2276 7084 40520 RussiaToday 2633 16543 96990 TunisiaNat1 2411 10998 64493 TunisiaNat1+ 631 2635 15371 #Lines #Words #Characters # Frames TV Channel Resolution 720 x 576 460 x 380 Resolution 1920 x 1080 720 x 576 460 x 380 TV Channel189 Figure4.ArchitectureofAcTiV2.0andstatisticsof thedetection(D)andrecognition(R)datasets. • AcTiV-D representsadatasetofnon-redundant framesusedtobuildandevaluatemethodsfor detecting text regions inHD/SDframes.Atotalof4063 frameshavebeenhand-selectedwith aparticular attention to achieve a highdiversity indepicted text regions. Figure 5provides examples fromAcTiV-Dfor typicalproblemsinvideotextdetection. Totest thesystems’ability to locate textsunderdifferentsituations, theproposeddataset includessomeframeswhichcontain thesametextregionbutwithdifferentbackgroundsandsomeotherswithoutanytextcomponent. • AcTiV-R is a dataset of textline images that canbeutilized to build and evaluateArabic text recognition systems. Different fonts (more than 6), sizes, backgrounds, colors, contrasts and occlusions are represented in thedataset. Figure 6 illustrates typical examples fromAcTiV-R. Thecollected text images coverabroadrangeof characteristics thatdistinguishvideo frames from scanned documents. AcTiV-R consists of 10,415 textline images, 44,583 words and 259,192characters. Tohaveaneasilyaccessible representationofArabic text, it is transformed intoasetofLatin labelswithasufﬁx that refers to the letter’sposition in theword,_B:Begin,_M: Middle;_E:End;and_I: Isolate.Anexample is showninFigure1.Duringtheannotationprocess, wehaveconsidered164Arabiccharacter forms: – 125 letters, i.e., taking intoaccount this“positioning”variability; – 15additionalcharacters, i.e., combinedwith thediacritic sign“Chadda”; – 10digits; and – 14punctuationmarks includingthewhite space. Thedifferentcharacter labelscanbeobservedinTable3. Thesametablegives foreachcharacter its frequency in thedataset. Moredetailsabout thestatisticsof thedetectionandrecognitiondatasetsare inFigure4. 193

back to the book Document Image Processing"

Document Image Processing

Title: Document Image Processing
Authors: Ergina Kavallieratou; Laurence Likforman-Sulem
Editor: MDPI
Location: Basel
Date: 2018
Language: German
License: CC BY-NC-ND 4.0
ISBN: 978-3-03897-106-1
Size: 17.0 x 24.4 cm
Pages: 216
Keywords: document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category: Informatik