Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 173 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 173 - in Document Image Processing

Image of the Page - 173 -

Image of the Page - 173 - in Document Image Processing

Text of the Page - 173 -

J. Imaging 2017,3, 62 originaldocument images, theresulting imagesbeingcalledsemi-synthetic images in therestof the paper. If there isnogroundtruthassociatedto thereal images,DocCreatorcancreate,withagiven text, synthetic images that look like the real ones and their associated ground truth. Depending on the needs and expertise of the user, DocCreator can be used in a fully automaticmode, or in asemi-automaticmodewhere theusercan interactwith thesystemandtune itsparameters.Visual feedback of the results is returned by the system. Degradations available inDocCreator can be appliedonanytypeofdocument images. TheDocCreatorability tocreatesyntheticdocuments that mimicrealones iseffective for typewrittenandhandwrittencharacters (as longas thecharactersare apart fromoneanother). Images createdwithDocCreatorhavealreadybeenused inmanyDIAR contexts: text/background/imagepixelclassification[36]; staff removal [13,37,38]; andhandwritten character recognition [39]. In this article we present howDocCreator can be useful to enhance a binarization algorithm and for OCR performance prediction. DocCreator could also be used, forexample, forcamera-baseddocument imageanalysisandwordspotting. Figure1.Accordingto theneedsof theDIARresearcher, it ispossible togeneratesyntheticdocument images (andtheirgroundtruth) indifferentways. Firstpossibility: if a researcherhasrealdocument imagesbutwithoutanyground truth,DocCreator cangenerate synthetic images that look like the real ones, and of course, with the associated ground truth. Second possibility: a researcher has aground-trutheddatabasebutit istoosmallornotheterogeneousenough.DocCreatorprovidesseveral degradationalgorithmstoaugment thedataset. Bydegradingtext ink,papershapeorbackground colours it is possible to create a representativedocument imagedatabasewheremanydefects are present. Thiscompletedatabase isfinallyuseful forverypreciseperformanceevaluationor toprovide multiplecases for retrainingprocesses (inalgorithmsembeddinga learningstep). DocCreator features compared to existing software are highlighted in Table 1. First of all, DocCreator is theonlyone that cancreate syntheticdocuments thatmimic realones. Besides, as it includesseveraldegradationmodels, itprovidesanintegratedsolutiontocarryoutdataaugmentation. DocCreator thusmakesquicklyavailableground-trutheddatabases. ItmakesDocCreatoraunique software thatcanbeseenasacomplementary tool to thosementionedinTable1. Thispaper isorganizedasfollows. InSection2,wepresent themethodsusedtoextractdocument characteristicsandtogeneratesyntheticdocuments,while inSection3documentdegradationmodels arediscussed. Section4highlights theadvantagesofDocCreatoronvariousDIAR tasks, both for benchmarkingandforretrainingDIARtoolsusingdataaugmentation. 2.HowtoCreateaSyntheticDocument (withGroundTruth)ThatLooksLikeaRealOne? TheleftpartofFigure1 illustrates thepipelineusedinorder togeneratesyntheticdocuments that lookrealistic.Givenanoriginal image,weextract the threemainrequiredcomponents: (1) the font; (2) the background; and (3) the layout of thedocument. The systemcan thenwrite any textwith 173
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing