Seite - 90 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 90 -
Text der Seite - 90 -
ClassificationandSegmentationofScannedLibraryCatalogue Cardsusing
Convolutional Neural Networks
MatthiasWo¨dlinger,RobertSablatnig
ComputerVisionLab,TUWien
{mwoedlinger,sab}@cvl.tuwien.ac.at
Abstract. Thelibraryof theTUWienhasbeendocu-
menting changes in its inventory in the form of phys-
ical library archive cards. To make these archive
cards digitally accessible, the cards and the text re-
gionsthereinneedtobecategorizedandthetextmust
be made machine-readable. In this paper we present
a pipeline consisting of classification, page segmen-
tation and automated handwriting recognition that,
given a scan of a library card, returns the category
this card belongs to and an xml file containing the
extracted and classified text.
1. Introduction
A library catalogue is a register where all bib-
liographic entries found in a library are listed. In
this paper we present a pipeline that automatically
processes scanned images of library catalogue doc-
uments such that they can be made available and
also searchable in an online database. While ear-
lier work in this direction uses hand crafted rules
and regular expressions to classify text in extracted
OCRdata, in recentyearsConvolutionalNeuralNet-
work (CNN) based methods that operate on pixel
levelhave formed thestate-of-the-art in this task [4].
The library catalogue at hand consists of 113073
mostly handwritten documents, mostly collected in
the time period from 1815 to 1930. The scanned im-
ages contain exactly the card with no surrounding
content (see Fig. 1). Documents are classified into
twogroups: librarycardswitha”Signatur”(aunique
identifier) that we call S cards and cards without it
(V cards). V cards are not relevant for the online
databaseandmustbe sortedout.
For training 2000 S cards and 500 V cards where
manuallyextracted. TheScardswherefurthersorted
into5classes based on their layout. The text regions
weremanually annotatedandverifiedbyexperts. Model Accuracy
ResNet18 0.988
ResNet34 0.988
ResNet50 0.994
Table1.Theaccuracyscoreson the test set. Theaccuracy
is computedwith respect toall 6 classes.
In this paper we describe a pipeline that, given a
scanned librarycard image,determines if it is typeS
or V and then returns an xml file with the extracted
and classified text. We describe the components of
our pipeline in Section 2 and give a conclusion in
Section3.
2.MethodologyandResults
The pipeline developed in this project is summa-
rized inFig. 1.
Classification of S and V cards We use a ResNet
[2]pretrainedonImageNetandfinetunedonourdoc-
uments tosortoutVcards. Wedonot freezeanylay-
ers during finetuning but instead train the full model
withasmaller initial learningof4 ·10−4. Toprevent
large class imbalances we train the network on all
6 classes. The 2500 annotated documents are ran-
domly split into train, test and validation sets and
rescaled to 512×512. Table 1 shows the accuracy
scoreson the test set for threeResNetswithdifferent
depthparameters.
Page segmentation of S cards The text regions
in S cards are categorized in 7 classes that each
contain document specific information like title, au-
thor, publisher or unique identifiers. The text region
classes are distinguished from one another by loca-
tion, font size and content. We use a CNN for image
segmentation to detect and classify the text regions
90
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik