Page - 91 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 91 -
Text of the Page - 91 -
Input image Text segmentation Baseline classification xml
(a) (b) (c) (d)
S cards
V cards
Figure 1. The proposed pipeline consisting of (a) an image classifier to sort out V cards (b) a segmentation network to
detectandclassify text regionsand(c)baselinesandfinally(d)anHTRmodelwhoseoutput iscombinedwiththebaseline
segmentationandsavedas anxml file. Colors denote thedifferent text categories.
Model mIoU
LargeKernelMatters (ResNext101) 0.793
DeepLabV3+(ResNet152) 0.799
dhSegment (ResNet50) 0.772
Table2.ThemIoUscores. The imageclassifiers inbrack-
etsdenote the frontendused.
and later also the text baselines therein. We exper-
iment with the models dhSegment [4], Global Con-
volutionalNetwork(GCN)[5]andDeepLabV3+[1].
The2000documentswerefirst split in50% trainand
25% test andvalidationdataeachand thenresized to
512Ă—512. Wefoundthataddingaborderaroundtext
regions (a line with constant width along the outline
of text regions) as an additional class during train-
ing helps the network in learning to separate differ-
ent text regions. Table2shows themean intersection
overunion (mIoU)scores for the threebestperform-
ingmodels. Thesegmentationis thenusedtoclassify
the extracted text asdescribedbelow.
Handwriting Recognition For the detection of
text baselines and handwritten text recognition
(HTR) model from Transkribus [3] are used. The
Transkribus platform contains models for baseline
detection and HTR pretrained on german Kurrent
writing (withacharacter error rateof7%onaseper-
ate reference dataset [3]), which is the predominant
writing style in our dataset. We apply the baseline
detection of Transkribus, then classify the baselines
according to thesegmentationandaddmissingbase-
lines forcommonerrors. Afterwards theHTRmodel
is appliedand the result is savedasanxmlfile.
3.Conclusion
We have presented an approach for the auto-
matic digitization of a library catalogue. We com-
paredstate-of-the-artmodels for semantic segmenta- tion and found that DeepLabV3+ performs well in
the task of page segmentation for historic handwrit-
ten documents. On the levels of baselines the clas-
sification of text using our segmentation appraoch
performs reasonably well for the application how-
ever the character error rate of 7% needs improve-
ment either through retraining on documents from
our dataset or by manual corrections. For futher
work, we believe that a better recognition of base-
lines has the largest potential for further improve-
ments.
References
[1] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and
H. Adam. Encoder-decoder with atrous separable
convolution for semantic imagesegmentation. InEu-
ropeanconferenceoncomputervision(ECCV),pages
801–818,2018.
[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual
learningfor imagerecognition. In InternationalCon-
ference on Computer Vision and Pattern Recognition
(CVPR), pages770–778,2016.
[3] P. Kahle, S. Colutto, G. Hackl, and G. Mu¨hlberger.
Transkribus – a service platform for transcription,
recognition and retrieval of historical documents. In
International Conference on Document Analysis and
Recognition (ICDAR), volume 4, pages 19–24. IEEE,
2017.
[4] S. A. Oliveira, B. Seguin, and F. Kaplan. dhsegment:
A generic deep-learning approach for document seg-
mentation. In International Conference on Frontiers
in Handwriting Recognition (ICFHR), pages 7–12.
IEEE,2018.
[5] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun.
Largekernelmatters–improvesemanticsegmentation
by global convolutional network. In International
Conference on Computer Vision and Pattern Recog-
nition (CVPR), pages 4353–4361,2017.
91
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik