Page - 116 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 116 -
Text of the Page - 116 -
HighlyAccurateBinaryImageSegmentation forCars
ThomasHeitzinger,MartinKampel
ComputerVisionLab,TUWien,Austria
{thomas.heitzinger,martin.kampel}@tuwien.ac.at
Abstract. We study methods for the generation of
highly accurate binary segmentation masks with ap-
plication to images of cars. The goal is the auto-
mated separation of cars from their background. A
fully convolutional network (FCN) based on the U-
Net architecture is trained on a private dataset con-
sistingofover7000samples. Themaincontributions
of the paper include a series of modification to com-
mon loss functions as well as the introduction of a
novel Gradient Loss that outperforms standard ap-
proaches. In a specialized postprocessing step the
generated masks are further refined to better match
theinherentcurvaturebias typically foundintheout-
line of cars. In direct comparison to previous imple-
mentations our method reduces the segmentation er-
rormeasuredby theJaccard indexbyover65%.
1. Introduction
A majority of buyers and sellers of cars choose to
useonlineplatforms. Thequalityofpicturesonsuch
platformshasaconsiderable impactonabuyers like-
lihood topurchaseand thus leads toademandforvi-
sually appealing images. For most sellers it is finan-
cially infeasible totakeprofessionalphotographsand
it has instead become common practice to digitally
edit them. A binary segmentation mask is created
that segments the image into foreground (the vehi-
cle) and background. This mask is used to either al-
ter (e.g. blur)orentirelyreplacethebackgroundwith
an artificial scene. Due to the significant demand
for high quality segmentation masks dedicated busi-
nesses offering this service have emerged. As each
photograph is edited by hand, the total time until the
segmentation mask is available to the dealership lies
between one and two days. The delay in time gener-
atesnon-negligiblecosts. Basedonnoveldeeplearn-
ingtechniquesthathaveadvancedthestate-of-the-art
in recent years we study methods for the fully auto- mated generation of segmentation masks with focus
on the maximization of accuracy. This paper aims to
improve the stateof theCarCutter1 service.
2.RelatedWork
The first application of convolutional networks to
semantic segmentationwithper-pixelpredictionwas
made possible by the introduction of fully convolu-
tional networks (FCN) [13]. Previously segmenta-
tion solutions repurposed convolutional network ar-
chitectures [12, 4] intended either for classification
or object detection and always included fully con-
nected layers. These adaptations come with draw-
backs on either speed or accuracy. By reinterpret-
ing fully connected layers in classification networks
as convolutional layers that cover the entire input re-
gion the network architecture is made independent
of the dimensions of the input image. Instead of
a class probability vector the reinterpreted network
outputs a coarse heatmap for each class. In order to
obtainpredictionsat thepixel level thecoarseseman-
tic information of deeper levels is repeatedly upsam-
pled and added to the activations of shallower fea-
ture maps. This innovation was quickly expanded
on and led to development of the U-Net architec-
ture [11]. It introducesasymmetricencoder-decoder
format consisting of a contracting encoder compo-
nent and an expanding decoder component. This
setup is chosen with the intention of learning a com-
paratively low dimensional image representation in
the narrow region of the network (referred to as
the bottleneck) that captures global context while at
the same time dramatically reducing the number of
learnedparameters. Skipconnectionsefficientlypass
shallow encoder features with high localization ac-
curacy to deep decoder layers that are rich in se-
mantic information. Variations on networks of this
1https://www.car-cutter.com/ (accessed February24, 2020)
116
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik