Page - 91 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 91 -

Text of the Page - 91 -

Input image Text segmentation Baseline classification xml (a) (b) (c) (d) S cards V cards Figure 1. The proposed pipeline consisting of (a) an image classifier to sort out V cards (b) a segmentation network to detectandclassify text regionsand(c)baselinesandfinally(d)anHTRmodelwhoseoutput iscombinedwiththebaseline segmentationandsavedas anxml file. Colors denote thedifferent text categories. Model mIoU LargeKernelMatters (ResNext101) 0.793 DeepLabV3+(ResNet152) 0.799 dhSegment (ResNet50) 0.772 Table2.ThemIoUscores. The imageclassifiers inbrack- etsdenote the frontendused. and later also the text baselines therein. We exper- iment with the models dhSegment [4], Global Con- volutionalNetwork(GCN)[5]andDeepLabV3+[1]. The2000documentswerefirst split in50% trainand 25% test andvalidationdataeachand thenresized to 512×512. Wefoundthataddingaborderaroundtext regions (a line with constant width along the outline of text regions) as an additional class during train- ing helps the network in learning to separate differ- ent text regions. Table2shows themean intersection overunion (mIoU)scores for the threebestperform- ingmodels. Thesegmentationis thenusedtoclassify the extracted text asdescribedbelow. Handwriting Recognition For the detection of text baselines and handwritten text recognition (HTR) model from Transkribus [3] are used. The Transkribus platform contains models for baseline detection and HTR pretrained on german Kurrent writing (withacharacter error rateof7%onaseper- ate reference dataset [3]), which is the predominant writing style in our dataset. We apply the baseline detection of Transkribus, then classify the baselines according to thesegmentationandaddmissingbase- lines forcommonerrors. Afterwards theHTRmodel is appliedand the result is savedasanxmlfile. 3.Conclusion We have presented an approach for the auto- matic digitization of a library catalogue. We com- paredstate-of-the-artmodels for semantic segmenta- tion and found that DeepLabV3+ performs well in the task of page segmentation for historic handwrit- ten documents. On the levels of baselines the clas- sification of text using our segmentation appraoch performs reasonably well for the application how- ever the character error rate of 7% needs improve- ment either through retraining on documents from our dataset or by manual corrections. For futher work, we believe that a better recognition of base- lines has the largest potential for further improve- ments. References [1] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic imagesegmentation. InEu- ropeanconferenceoncomputervision(ECCV),pages 801–818,2018. [2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor imagerecognition. In InternationalCon- ference on Computer Vision and Pattern Recognition (CVPR), pages770–778,2016. [3] P. Kahle, S. Colutto, G. Hackl, and G. Mu¨hlberger. Transkribus – a service platform for transcription, recognition and retrieval of historical documents. In International Conference on Document Analysis and Recognition (ICDAR), volume 4, pages 19–24. IEEE, 2017. [4] S. A. Oliveira, B. Seguin, and F. Kaplan. dhsegment: A generic deep-learning approach for document seg- mentation. In International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 7–12. IEEE,2018. [5] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun. Largekernelmatters–improvesemanticsegmentation by global convolutional network. In International Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 4353–4361,2017. 91

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik