Page - 117 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 117 -
Text of the Page - 117 -
type are often focused on the decoder component,
while the standard approach for the encoder com-
ponent is the repurposing of the convolutional stage
of known, well performing networks, such as VGG-
16 [14]. The variations in the decoder component
essentially explore the trade-off between low mem-
ory requirements (and fast inference) and high accu-
racy. Architectures such as [16] also investigate the
benefitsofanadditionalResNet[5]basedrefinement
stage. Benchmarksshow[2,10] thatalmostall state-
of-the-art solutions for a variety of image segmenta-
tion tasks are based on the U-Net architecture. It is
also chosen by well performing entries [6, 15] to the
KaggleCarvana ImageMaskingchallenge. Ternaus-
Net[6]waspartof thewinningentry in thechallenge
andusesapretrainedencoderbasedonVGG-11[14]
while [15] placed in the top 4% using an ensemble
of five network with a pretrained ResNet-50 [5] en-
coder.
3.Dataset
Training was done on a private dataset consisting
of 7614 pairs of RGB-images and binary segmenta-
tion masks. Some images contain additional cars in
the background that are smaller by area. In these
cases the solution is expected to only segment the
main vehicle. The dataset exhibits a bias towards
German car brands such as Volkswagen, BMW and
Mercedes and contains a disproportionate amount of
imageswithcarshigher-than-averageincost. During
preprocessingall imagesareresizedtoaresolutionof
800px×600px. Data augmentation is used to boost
the available trainingdata.
Toourknowledge, themostcloselyrelateddataset
is tied to the Kaggle Carvana Image Masking chal-
lenge2. Thegoalof thischallengeis identical toours.
Its dataset contains roughly 100000 image/mask
pairswithresolution1920px×1080px. Comparedto
our dataset the samples are more uniform. Each pic-
ture contains exactly one vehicle which is placed in
a fixed position and all photographs are taken by the
samestationarycamerasunderidentical lightingcon-
ditions. Thewinningentryof thischallengeachieved
aJaccardindexof0.9947whichweconsider tobean
upperbound to the scoreachievableonour dataset.
2https://www.kaggle.com/c/carvana-image-masking-
challenge (accessed February21,2020) 4.Methods
Segmentation is performed with a fully convo-
lutional neural network of the U-Net architecture.
Its implementation is similar to [16], with a pre-
trained convolutional stage of a VGG-16 network
with batch normalization for the encoder and an ad-
ditional ResNet-style refinement block after the de-
coder. Segmentation quality is evaluated using the
Jaccard index which is the de facto standard metric
for imagesegmentationmethods:
MJ(P,T) := |P∩T||P∪T|. (1)
In our contextT andP are subsets of target (ground
truth) and predicted pixels in a segmentation mask.
Imagesx, target masks t and predicted masks p are
assumed to be non-binary with heightN, widthM
andvalues in the range [0,1].
We study training with (modifications of) the loss
functions Mean Squared Error LMSE and Binary
Cross-EntropyLBCE as well as the Dice Loss [9]
LDSCwhich isdefinedas
LDSC(p,t) :=1− +2 ∑
(i,j)∈Dpijtij
+ ∑
(i,j)∈Dpij+ tij , (2)
and is related to the Jaccard index. Here D =
{1 . . N}×{1 . .M} is the domain of the segmen-
tation masks and NM is a small scalar regular-
ization term.
4.1.WeightingSchemes
We propose modifications that improve upon the
standard losses Mean Squared Error and Binary
Cross-Entropy. The main idea is that not all areas
of an image are equally important or equally diffi-
cult to segment. Loss functions that are the sum or
mean of pixelwise losses can be modified to assign
weights to each pixel in order to adjust for this inho-
mogeneity. Wecanuseamapwof realweightswith
shape equal to t andp and define a modified version
ofMeanSquaredError as:
LMSE(p,t) := 1
NM ∑
(i,j)∈D
wij ( pij− tij )2
. (3)
An analogous modification can be made to Binary
Cross-Entropy.
117
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik