Page - 90 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 90 -

Text of the Page - 90 -

ClassificationandSegmentationofScannedLibraryCatalogue Cardsusing Convolutional Neural Networks MatthiasWo¨dlinger,RobertSablatnig ComputerVisionLab,TUWien {mwoedlinger,sab}@cvl.tuwien.ac.at Abstract. Thelibraryof theTUWienhasbeendocu- menting changes in its inventory in the form of phys- ical library archive cards. To make these archive cards digitally accessible, the cards and the text re- gionsthereinneedtobecategorizedandthetextmust be made machine-readable. In this paper we present a pipeline consisting of classification, page segmen- tation and automated handwriting recognition that, given a scan of a library card, returns the category this card belongs to and an xml file containing the extracted and classified text. 1. Introduction A library catalogue is a register where all bib- liographic entries found in a library are listed. In this paper we present a pipeline that automatically processes scanned images of library catalogue doc- uments such that they can be made available and also searchable in an online database. While ear- lier work in this direction uses hand crafted rules and regular expressions to classify text in extracted OCRdata, in recentyearsConvolutionalNeuralNet- work (CNN) based methods that operate on pixel levelhave formed thestate-of-the-art in this task [4]. The library catalogue at hand consists of 113073 mostly handwritten documents, mostly collected in the time period from 1815 to 1930. The scanned im- ages contain exactly the card with no surrounding content (see Fig. 1). Documents are classified into twogroups: librarycardswitha”Signatur”(aunique identifier) that we call S cards and cards without it (V cards). V cards are not relevant for the online databaseandmustbe sortedout. For training 2000 S cards and 500 V cards where manuallyextracted. TheScardswherefurthersorted into5classes based on their layout. The text regions weremanually annotatedandverifiedbyexperts. Model Accuracy ResNet18 0.988 ResNet34 0.988 ResNet50 0.994 Table1.Theaccuracyscoreson the test set. Theaccuracy is computedwith respect toall 6 classes. In this paper we describe a pipeline that, given a scanned librarycard image,determines if it is typeS or V and then returns an xml file with the extracted and classified text. We describe the components of our pipeline in Section 2 and give a conclusion in Section3. 2.MethodologyandResults The pipeline developed in this project is summa- rized inFig. 1. Classification of S and V cards We use a ResNet [2]pretrainedonImageNetandfinetunedonourdoc- uments tosortoutVcards. Wedonot freezeanylay- ers during finetuning but instead train the full model withasmaller initial learningof4 ·10−4. Toprevent large class imbalances we train the network on all 6 classes. The 2500 annotated documents are ran- domly split into train, test and validation sets and rescaled to 512×512. Table 1 shows the accuracy scoreson the test set for threeResNetswithdifferent depthparameters. Page segmentation of S cards The text regions in S cards are categorized in 7 classes that each contain document specific information like title, au- thor, publisher or unique identifiers. The text region classes are distinguished from one another by loca- tion, font size and content. We use a CNN for image segmentation to detect and classify the text regions 90

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik