Page - 116 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 116 -

Text of the Page - 116 -

HighlyAccurateBinaryImageSegmentation forCars ThomasHeitzinger,MartinKampel ComputerVisionLab,TUWien,Austria {thomas.heitzinger,martin.kampel}@tuwien.ac.at Abstract. We study methods for the generation of highly accurate binary segmentation masks with ap- plication to images of cars. The goal is the auto- mated separation of cars from their background. A fully convolutional network (FCN) based on the U- Net architecture is trained on a private dataset con- sistingofover7000samples. Themaincontributions of the paper include a series of modification to com- mon loss functions as well as the introduction of a novel Gradient Loss that outperforms standard ap- proaches. In a specialized postprocessing step the generated masks are further refined to better match theinherentcurvaturebias typically foundintheout- line of cars. In direct comparison to previous imple- mentations our method reduces the segmentation er- rormeasuredby theJaccard indexbyover65%. 1. Introduction A majority of buyers and sellers of cars choose to useonlineplatforms. Thequalityofpicturesonsuch platformshasaconsiderable impactonabuyers like- lihood topurchaseand thus leads toademandforvi- sually appealing images. For most sellers it is finan- cially infeasible totakeprofessionalphotographsand it has instead become common practice to digitally edit them. A binary segmentation mask is created that segments the image into foreground (the vehi- cle) and background. This mask is used to either al- ter (e.g. blur)orentirelyreplacethebackgroundwith an artificial scene. Due to the significant demand for high quality segmentation masks dedicated busi- nesses offering this service have emerged. As each photograph is edited by hand, the total time until the segmentation mask is available to the dealership lies between one and two days. The delay in time gener- atesnon-negligiblecosts. Basedonnoveldeeplearn- ingtechniquesthathaveadvancedthestate-of-the-art in recent years we study methods for the fully auto- mated generation of segmentation masks with focus on the maximization of accuracy. This paper aims to improve the stateof theCarCutter1 service. 2.RelatedWork The first application of convolutional networks to semantic segmentationwithper-pixelpredictionwas made possible by the introduction of fully convolu- tional networks (FCN) [13]. Previously segmenta- tion solutions repurposed convolutional network ar- chitectures [12, 4] intended either for classification or object detection and always included fully con- nected layers. These adaptations come with draw- backs on either speed or accuracy. By reinterpret- ing fully connected layers in classification networks as convolutional layers that cover the entire input re- gion the network architecture is made independent of the dimensions of the input image. Instead of a class probability vector the reinterpreted network outputs a coarse heatmap for each class. In order to obtainpredictionsat thepixel level thecoarseseman- tic information of deeper levels is repeatedly upsam- pled and added to the activations of shallower fea- ture maps. This innovation was quickly expanded on and led to development of the U-Net architec- ture [11]. It introducesasymmetricencoder-decoder format consisting of a contracting encoder compo- nent and an expanding decoder component. This setup is chosen with the intention of learning a com- paratively low dimensional image representation in the narrow region of the network (referred to as the bottleneck) that captures global context while at the same time dramatically reducing the number of learnedparameters. Skipconnectionsefficientlypass shallow encoder features with high localization ac- curacy to deep decoder layers that are rich in se- mantic information. Variations on networks of this 1https://www.car-cutter.com/ (accessed February24, 2020) 116

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik