Seite - 66 - in Document Image Processing
Bild der Seite - 66 -
Text der Seite - 66 -
J. Imaging 2018,4, 6
Figure5.Anexampleofarescoring lattice.
6. ExperimentResults
Totrain theproposedholisticArabicOCRsystem,weuseda lexiconofaround356,000words
selected from the newsdomainwith high coverage for theArabic Language. Using this lexicon,
wegeneratedadatabaseof images for three fonts: SimplifiedArabic,TraditionalArabicandArabic
Transparent, in300dpiwith fourdifferentsizes.
To test the system, we used three different test datasets that represent different degrees
ofchallenges:
1. Laserscannedtextdataset: Thisdataset iscomposedof1152singlewordstakenfromnewspaper
articles andprinted in three fonts and fourdifferent sizes in two typesofqualities: cleanand
firstcopy.
2. Recent computerizedbooksdataset: Adataset composedof10scannedpages fromdifferent
recentcomputerizedbooks thatcontain2730words.
3. Oldun-computerizedbooks: Thisdataset consistsof10scannedpagescontain2276words from
oldbooks thatare typewrittenwithnotwellknownfonts.
Figure6 illustratessomeexamplesof thescannedimages. In thefirstexperiment,weevaluated
our systemusing the laser scanned data set. Initially, we evaluated the systemon a single font.
Thesystemwas trainedonasingle fontwithsinglesizebutwas testedonthesamefontwithdifferent
sizes. Wedidn’t use the languagemodelwith this dataset as it consists of singlewords. Table 2
illustrates theWordRecognitionRate (WRR)results for thisexperiment.
Figure6.Somesamplesof thescannedimages.
66
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik