Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Document Image Processing
Page - 171 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 171 - in Document Image Processing

Image of the Page - 171 -

Image of the Page - 171 - in Document Image Processing

Text of the Page - 171 -

Journal of Imaging Article DocCreator:ANewSoftwareforCreatingSynthetic Ground-TruthedDocumentImages NicholasJournet1,*,†,,MurielVisani2,†,BorisMansencal 1,†,KieuVan-Cuong3andAntoineBilly 1 1 LaboratoireBordelaisdeRechercheenInformatiqueUMR5800,UniversitĂ©deBordeaux,CNRS, BordeauxINP,33400Talence,France;boris.mansencal@labri.fr (B.M.); antoine.billy@labri.fr (A.B.) 2 Laboratoire Informatique, Imageet Interaction(L3i),UniversitĂ©deLaRochelle, 17000LaRochelle,France; muriel.visani@univ-lr.fr 3 LIPADELaboratory,ParisDescartesUniversity,45, ruedesSaints-PĂšres,75270Paris,CEDEX6,France; van-cuong.kieu@parisdescartes.fr * Correspondence: journet@labri.fr † Theseauthorscontributedequally to thiswork.Otherauthors:KieuVan-Cuongworkedondegradation models,AntoineBillyworkedonsyntheticdocumentreconstruction. Received: 30October2017;Accepted: 5December2017;Published: 11December2017 Abstract:Mostdigital libraries thatprovideuser-friendly interfaces, enablingquickand intuitive access to their resources,arebasedonDocument ImageAnalysisandRecognition(DIAR)methods. SuchDIARmethodsneedground-trutheddocument images tobeevaluated/comparedand, insome cases, trained. Especiallywith theadventofdeep learning-basedapproaches, therequiredsizeof annotateddocumentdatasetsseemstobeever-growing.Manuallyannotatingrealdocumentshas manydrawbacks,whichoften leads to small reliably annotateddatasets. Inorder to circumvent thosedrawbacksandenable thegenerationofmassiveground-trutheddatawithhighvariability, wepresentDocCreator, amulti-platformandopen-source softwareable to createmanysynthetic imagedocumentswithcontrolledgroundtruth.DocCreatorhasbeenusedinvariousexperiments, showingthe interestofusingsuchsynthetic images toenrich the trainingstageofDIARtools. Keywords: synthetic imagegeneration; documentdegradationmodels; performance evaluation; dataaugmentationforretrainingandïŹne-tuning;DIAR 1. Introduction Almostevery researcher in the fieldofDocument ImageAnalysisandRecognition (DIAR)had to face the problemof obtaining a ground-truthed document image dataset. Indeed,manyDIAR tools (image restoration, layout analysis, text-graphic separation, binarization, OCR, etc.) rely on apreliminarystageof supervised training. Moreover, ground-trutheddocument imagedatasetsare neededtoevaluatetheseDIARtools.Digitalcuratorsarethefirstusersofthesetools,e.g., forannouncing expectedOCRrecognition rates togetherwith automatic transcriptions of books [1]. One common solution is to use ground-truthed training and benchmarking datasets publicly available on the internet. Fordocument images, the followingdatabases are themost commonlyused. Forprinted documents: WashingtonUW3 [2], LRDE [3], RETAS-OCR [4], PaRADIIT [5], etc.; for handwritten documentsIAMdatabase[6],RIMES[7],GERMANA[8],etc.; forgraphicaldocuments: chemicalsymbol database [9], logodatabases [10,11], architectural symboldatabase [12] ormusical symboldatabase CVC-MUSICMA[13]; camera-baseddocument imageanalysis [14,15]. TheInternationalAssociation forPatternRecognition, for instance,gatheredsomeinterestingdatasets [16]mostlyusedfordifferent conference competitionsover the last twodecades. Themain international conference indocument imageanalysis, ICDAR,referencesonitswebsitesmanycontestdatasets.However,veryfewofthem J. Imaging 2017,3, 62 171 www.mdpi.com/journal/jimaging
back to the  book Document Image Processing"
Document Image Processing
Title
Document Image Processing
Authors
Ergina Kavallieratou
Laurence Likforman-Sulem
Editor
MDPI
Location
Basel
Date
2018
Language
German
License
CC BY-NC-ND 4.0
ISBN
978-3-03897-106-1
Size
17.0 x 24.4 cm
Pages
216
Keywords
document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
Category
Informatik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Document Image Processing