Page - 115 - in Document Image Processing
Image of the Page - 115 -
Text of the Page - 115 -
J. Imaging 2018,4, 43
ofconnectedcomponents that representedacorrectcharacter inBalinesescript fromtheword-level
binarized images thatweremanuallyannotated[11,17,20]usingAletheia (http://www.primaresearch.
org/tools/Aletheia) [62,63] (Figure14). TheSundanesecharacterdatasetwasannotatedmanually [22]
(Figure15). For theKhmercharacterdataset, a toolhasbeendevelopedtoannotatecharacters/glyphs
on the document page. The polygon boundary of each character is tracedmanually by dotting
out itsvertexonebyone. A label isgiven toeachannotatedcharacter after its boundaryhasbeen
constructed[21] (Figure16).
Table3.Palmleafmanuscriptdatasets for isolatedcharacter/glyphrecognitiontask.
Manuscripts Classes Train Test Dataset
Balinese 133classes 11,710 images 7673 images AMADI_LontarSet [17,25,28]
Khmer 111classes 113,206 images 90,669 images SleukRithSet [21]
Sundanese 60classes 4555 images 2816 images SundaDataset [22]
Figure14.Balinesecharacterdataset.
115
back to the
book Document Image Processing"
Document Image Processing
- Title
- Document Image Processing
- Authors
- Ergina Kavallieratou
- Laurence Likforman-Sulem
- Editor
- MDPI
- Location
- Basel
- Date
- 2018
- Language
- German
- License
- CC BY-NC-ND 4.0
- ISBN
- 978-3-03897-106-1
- Size
- 17.0 x 24.4 cm
- Pages
- 216
- Keywords
- document image processing, preprocessing, binarizationl, text-line segmentation, handwriting recognition, indic/arabic/asian script, OCR, Video OCR, word spotting, retrieval, document datasets, performance evaluation, document annotation tools
- Category
- Informatik