Web-Books
im Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Seite - 173 -
  • Benutzer
  • Version
    • Vollversion
    • Textversion
  • Sprache
    • Deutsch
    • English - Englisch

Seite - 173 - in Document Image Processing

Bild der Seite - 173 -

Bild der Seite - 173 - in Document Image Processing

Text der Seite - 173 -

J. Imaging 2017,3, 62 originaldocument images, theresulting imagesbeingcalledsemi-synthetic images in therestof the paper. If there isnogroundtruthassociatedto thereal images,DocCreatorcancreate,withagiven text, synthetic images that look like the real ones and their associated ground truth. Depending on the needs and expertise of the user, DocCreator can be used in a fully automaticmode, or in asemi-automaticmodewhere theusercan interactwith thesystemandtune itsparameters.Visual feedback of the results is returned by the system. Degradations available inDocCreator can be appliedonanytypeofdocument images. TheDocCreatorability tocreatesyntheticdocuments that mimicrealones iseffective for typewrittenandhandwrittencharacters (as longas thecharactersare apart fromoneanother). Images createdwithDocCreatorhavealreadybeenused inmanyDIAR contexts: text/background/imagepixelclassification[36]; staff removal [13,37,38]; andhandwritten character recognition [39]. In this article we present howDocCreator can be useful to enhance a binarization algorithm and for OCR performance prediction. DocCreator could also be used, forexample, forcamera-baseddocument imageanalysisandwordspotting. Figure1.Accordingto theneedsof theDIARresearcher, it ispossible togeneratesyntheticdocument images (andtheirgroundtruth) indifferentways. Firstpossibility: if a researcherhasrealdocument imagesbutwithoutanyground truth,DocCreator cangenerate synthetic images that look like the real ones, and of course, with the associated ground truth. Second possibility: a researcher has aground-trutheddatabasebutit istoosmallornotheterogeneousenough.DocCreatorprovidesseveral degradationalgorithmstoaugment thedataset. Bydegradingtext ink,papershapeorbackground colours it is possible to create a representativedocument imagedatabasewheremanydefects are present. Thiscompletedatabase isfinallyuseful forverypreciseperformanceevaluationor toprovide multiplecases for retrainingprocesses (inalgorithmsembeddinga learningstep). DocCreator features compared to existing software are highlighted in Table 1. First of all, DocCreator is theonlyone that cancreate syntheticdocuments thatmimic realones. Besides, as it includesseveraldegradationmodels, itprovidesanintegratedsolutiontocarryoutdataaugmentation. DocCreator thusmakesquicklyavailableground-trutheddatabases. ItmakesDocCreatoraunique software thatcanbeseenasacomplementary tool to thosementionedinTable1. Thispaper isorganizedasfollows. InSection2,wepresent themethodsusedtoextractdocument characteristicsandtogeneratesyntheticdocuments,while inSection3documentdegradationmodels arediscussed. Section4highlights theadvantagesofDocCreatoronvariousDIAR tasks, both for benchmarkingandforretrainingDIARtoolsusingdataaugmentation. 2.HowtoCreateaSyntheticDocument (withGroundTruth)ThatLooksLikeaRealOne? TheleftpartofFigure1 illustrates thepipelineusedinorder togeneratesyntheticdocuments that lookrealistic.Givenanoriginal image,weextract the threemainrequiredcomponents: (1) the font; (2) the background; and (3) the layout of thedocument. The systemcan thenwrite any textwith 173
zurück zum  Buch Document Image Processing"
Document Image Processing
Titel
Document Image Processing
Autoren
Ergina Kavallieratou
Laurence Likforman-Sulem
Herausgeber
MDPI
Ort
Basel
Datum
2018
Sprache
deutsch
Lizenz
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Abmessungen
17.0 x 24.4 cm
Seiten
216
Schlagwörter
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Kategorie
Informatik
Web-Books
Bibliothek
Datenschutz
Impressum
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing