Seite - 115 - in Document Image Processing
Bild der Seite - 115 -
Text der Seite - 115 -
J. Imaging 2018,4, 43
ofconnectedcomponents that representedacorrectcharacter inBalinesescript fromtheword-level
binarized images thatweremanuallyannotated[11,17,20]usingAletheia (http://www.primaresearch.
org/tools/Aletheia) [62,63] (Figure14). TheSundanesecharacterdatasetwasannotatedmanually [22]
(Figure15). For theKhmercharacterdataset, a toolhasbeendevelopedtoannotatecharacters/glyphs
on the document page. The polygon boundary of each character is tracedmanually by dotting
out itsvertexonebyone. A label isgiven toeachannotatedcharacter after its boundaryhasbeen
constructed[21] (Figure16).
Table3.Palmleafmanuscriptdatasets for isolatedcharacter/glyphrecognitiontask.
Manuscripts Classes Train Test Dataset
Balinese 133classes 11,710 images 7673 images AMADI_LontarSet [17,25,28]
Khmer 111classes 113,206 images 90,669 images SleukRithSet [21]
Sundanese 60classes 4555 images 2816 images SundaDataset [22]
Figure14.Balinesecharacterdataset.
115
zurück zum
Buch Document Image Processing"
Document Image Processing
- Titel
- Document Image Processing
- Autoren
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Herausgeber
- MDPI
- Ort
- Basel
- Datum
- 2018
- Sprache
- deutsch
- Lizenz
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Abmessungen
- 17.0 x 24.4 cm
- Seiten
- 216
- Schlagwörter
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Kategorie
- Informatik