Page - 66 - in Document Image Processing
Image of the Page - 66 -
Text of the Page - 66 -
J. Imaging 2018,4, 6
Figure5.Anexampleofarescoring lattice.
6. ExperimentResults
Totrain theproposedholisticArabicOCRsystem,weuseda lexiconofaround356,000words
selected from the newsdomainwith high coverage for theArabic Language. Using this lexicon,
wegeneratedadatabaseof images for three fonts: SimplifiedArabic,TraditionalArabicandArabic
Transparent, in300dpiwith fourdifferentsizes.
To test the system, we used three different test datasets that represent different degrees
ofchallenges:
1. Laserscannedtextdataset: Thisdataset iscomposedof1152singlewordstakenfromnewspaper
articles andprinted in three fonts and fourdifferent sizes in two typesofqualities: cleanand
firstcopy.
2. Recent computerizedbooksdataset: Adataset composedof10scannedpages fromdifferent
recentcomputerizedbooks thatcontain2730words.
3. Oldun-computerizedbooks: Thisdataset consistsof10scannedpagescontain2276words from
oldbooks thatare typewrittenwithnotwellknownfonts.
Figure6 illustratessomeexamplesof thescannedimages. In thefirstexperiment,weevaluated
our systemusing the laser scanned data set. Initially, we evaluated the systemon a single font.
Thesystemwas trainedonasingle fontwithsinglesizebutwas testedonthesamefontwithdifferent
sizes. Wedidn’t use the languagemodelwith this dataset as it consists of singlewords. Table 2
illustrates theWordRecognitionRate (WRR)results for thisexperiment.
Figure6.Somesamplesof thescannedimages.
66
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik