Web-Books
in the Austria-Forum
Austria-Forum
Web-Books
Informatik
Joint Austrian Computer Vision and Robotics Workshop 2020
Page - 91 -
  • User
  • Version
    • full version
    • text only version
  • Language
    • Deutsch - German
    • English

Page - 91 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 91 -

Image of the Page - 91 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Text of the Page - 91 -

Input image Text segmentation Baseline classification xml (a) (b) (c) (d) S cards V cards Figure 1. The proposed pipeline consisting of (a) an image classifier to sort out V cards (b) a segmentation network to detectandclassify text regionsand(c)baselinesandfinally(d)anHTRmodelwhoseoutput iscombinedwiththebaseline segmentationandsavedas anxml file. Colors denote thedifferent text categories. Model mIoU LargeKernelMatters (ResNext101) 0.793 DeepLabV3+(ResNet152) 0.799 dhSegment (ResNet50) 0.772 Table2.ThemIoUscores. The imageclassifiers inbrack- etsdenote the frontendused. and later also the text baselines therein. We exper- iment with the models dhSegment [4], Global Con- volutionalNetwork(GCN)[5]andDeepLabV3+[1]. The2000documentswerefirst split in50% trainand 25% test andvalidationdataeachand thenresized to 512×512. Wefoundthataddingaborderaroundtext regions (a line with constant width along the outline of text regions) as an additional class during train- ing helps the network in learning to separate differ- ent text regions. Table2shows themean intersection overunion (mIoU)scores for the threebestperform- ingmodels. Thesegmentationis thenusedtoclassify the extracted text asdescribedbelow. Handwriting Recognition For the detection of text baselines and handwritten text recognition (HTR) model from Transkribus [3] are used. The Transkribus platform contains models for baseline detection and HTR pretrained on german Kurrent writing (withacharacter error rateof7%onaseper- ate reference dataset [3]), which is the predominant writing style in our dataset. We apply the baseline detection of Transkribus, then classify the baselines according to thesegmentationandaddmissingbase- lines forcommonerrors. Afterwards theHTRmodel is appliedand the result is savedasanxmlfile. 3.Conclusion We have presented an approach for the auto- matic digitization of a library catalogue. We com- paredstate-of-the-artmodels for semantic segmenta- tion and found that DeepLabV3+ performs well in the task of page segmentation for historic handwrit- ten documents. On the levels of baselines the clas- sification of text using our segmentation appraoch performs reasonably well for the application how- ever the character error rate of 7% needs improve- ment either through retraining on documents from our dataset or by manual corrections. For futher work, we believe that a better recognition of base- lines has the largest potential for further improve- ments. References [1] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic imagesegmentation. InEu- ropeanconferenceoncomputervision(ECCV),pages 801–818,2018. [2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor imagerecognition. In InternationalCon- ference on Computer Vision and Pattern Recognition (CVPR), pages770–778,2016. [3] P. Kahle, S. Colutto, G. Hackl, and G. Mu¨hlberger. Transkribus – a service platform for transcription, recognition and retrieval of historical documents. In International Conference on Document Analysis and Recognition (ICDAR), volume 4, pages 19–24. IEEE, 2017. [4] S. A. Oliveira, B. Seguin, and F. Kaplan. dhsegment: A generic deep-learning approach for document seg- mentation. In International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 7–12. IEEE,2018. [5] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun. Largekernelmatters–improvesemanticsegmentation by global convolutional network. In International Conference on Computer Vision and Pattern Recog- nition (CVPR), pages 4353–4361,2017. 91
back to the  book Joint Austrian Computer Vision and Robotics Workshop 2020"
Joint Austrian Computer Vision and Robotics Workshop 2020
Title
Joint Austrian Computer Vision and Robotics Workshop 2020
Editor
Graz University of Technology
Location
Graz
Date
2020
Language
English
License
CC BY 4.0
ISBN
978-3-85125-752-6
Size
21.0 x 29.7 cm
Pages
188
Categories
Informatik
Technik
Web-Books
Library
Privacy
Imprint
Austria-Forum
Austria-Forum
Web-Books
Joint Austrian Computer Vision and Robotics Workshop 2020