Page - 181 - in Document Image Processing
Image of the Page - 181 -
Text of the Page - 181 -
J. Imaging 2017,3, 62
Figure10.Examplesofnonlinear illuminationmodeldefect. (Left)original image. (Right)degraded
imagewith illuminationdefectappliedonthe leftborder.
4.UseofDocCreator forPerformanceEvaluationTasksorRetraining
Here,wedescriberapidlyhowDocCreatorwasusedbyotherresearchersandtheconclusions
theydrew.
4.1. PublishedResultsUsingDocCreator
4.1.1.Document ImageGenerationforPerformanceEvaluation
Thesegmentationsystemproposedby[36] isbasedonatexture featureextractionwithoutany
aprioriknowledgeonthephysicalandlogicaldocument layout. Toassess thenoiserobustnessof their
system, theyusedDocCreatorandappliedthecharacterdegradationmodel. From25simpliļ¬edreal
document images, theygeneratedasemi-syntheticdatabaseof150document images. Thisdatabase is
madeupofseveral subsetswhere thedegradation levelsaredifferent. Theperformanceevaluations
presented in [36] highlight that the texturedescriptors are slightlyperturbedby thedegradations.
Whencharactersarehighlydisconnected(ouralgorithmhaserased importantcharacter inkareas),
adropof thesegmentationperformanceswasobserved.
DocCreatorwas alsousedduring the ICDARcontest: staff-line removal frommusical scores.
The3Ddistortionandthecharacterdegradationmodelswereusedinorder togenerateanextended
database fromthe1000 imagesof theMUSCIMAdatabase [13]. Asaresult, theextendeddatabase
contains6000semi-syntheticgrayscale imagesand6000semi-syntheticbinary images. Thisdatabase
hasbeenused in thesecondeditionof themusicscorecompetition ICDAR2013 [37]. Fiveparticipants
submittedeightmethods. Participantsweregivena trainingsetof4000semi-synthetic imagesand
then2000semi-synthetic images to test theirmethodson.Regardingtheresultsonthe3Ddistortion
set, thesubmittedmethodsseemlessrobust toglobaldistortionthanto thepresenceofsmall curves
andfolds. Formoredetailsabout theparticipants, themethodsandthecontestprotocol, refer to [37].
Thisdatabasehasalreadybecomeabenchmarkdatabase formusicaldocument imagesanalysisand
recognition,asstated in [53]. So far, thedatabasehas indeedbeenusedforbenchmarking inmultiple
scientiļ¬cpublicationsaboutmusicaldocumentprocessingandrecognition[38,53ā56]andeven in the
moregeneralļ¬eldofmachine learning[57].
4.1.2.Document ImageGenerationforRetrainingTask
TheIAM-HistDB[58]databasecontains127handwrittenhistoricalmanuscript images together
with theirgroundtruth. Thisdatabaseconsistsof threesets: theSaintGall setcontaining60 images
(1.410 text lines) inLatin, theParzival set containing47 images (4.477 text lines) inMedievalGerman,
and theWashington set containing 20 images in English. The authors of [39] used the character
degradationmodel tocreate twoextendeddatabasesof the IAM-HistDB.Theļ¬rstone iscomposed
of17.661 imagesdegradedwith the inkmodel. The1.524 images fromtheseconddatasethavebeen
181
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik