Page - 117 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 117 -

Text of the Page - 117 -

type are often focused on the decoder component, while the standard approach for the encoder com- ponent is the repurposing of the convolutional stage of known, well performing networks, such as VGG- 16 [14]. The variations in the decoder component essentially explore the trade-off between low mem- ory requirements (and fast inference) and high accu- racy. Architectures such as [16] also investigate the benefitsofanadditionalResNet[5]basedrefinement stage. Benchmarksshow[2,10] thatalmostall state- of-the-art solutions for a variety of image segmenta- tion tasks are based on the U-Net architecture. It is also chosen by well performing entries [6, 15] to the KaggleCarvana ImageMaskingchallenge. Ternaus- Net[6]waspartof thewinningentry in thechallenge andusesapretrainedencoderbasedonVGG-11[14] while [15] placed in the top 4% using an ensemble of five network with a pretrained ResNet-50 [5] en- coder. 3.Dataset Training was done on a private dataset consisting of 7614 pairs of RGB-images and binary segmenta- tion masks. Some images contain additional cars in the background that are smaller by area. In these cases the solution is expected to only segment the main vehicle. The dataset exhibits a bias towards German car brands such as Volkswagen, BMW and Mercedes and contains a disproportionate amount of imageswithcarshigher-than-averageincost. During preprocessingall imagesareresizedtoaresolutionof 800px×600px. Data augmentation is used to boost the available trainingdata. Toourknowledge, themostcloselyrelateddataset is tied to the Kaggle Carvana Image Masking chal- lenge2. Thegoalof thischallengeis identical toours. Its dataset contains roughly 100000 image/mask pairswithresolution1920px×1080px. Comparedto our dataset the samples are more uniform. Each pic- ture contains exactly one vehicle which is placed in a fixed position and all photographs are taken by the samestationarycamerasunderidentical lightingcon- ditions. Thewinningentryof thischallengeachieved aJaccardindexof0.9947whichweconsider tobean upperbound to the scoreachievableonour dataset. 2https://www.kaggle.com/c/carvana-image-masking- challenge (accessed February21,2020) 4.Methods Segmentation is performed with a fully convo- lutional neural network of the U-Net architecture. Its implementation is similar to [16], with a pre- trained convolutional stage of a VGG-16 network with batch normalization for the encoder and an ad- ditional ResNet-style refinement block after the de- coder. Segmentation quality is evaluated using the Jaccard index which is the de facto standard metric for imagesegmentationmethods: MJ(P,T) := |P∩T||P∪T|. (1) In our contextT andP are subsets of target (ground truth) and predicted pixels in a segmentation mask. Imagesx, target masks t and predicted masks p are assumed to be non-binary with heightN, widthM andvalues in the range [0,1]. We study training with (modifications of) the loss functions Mean Squared Error LMSE and Binary Cross-EntropyLBCE as well as the Dice Loss [9] LDSCwhich isdefinedas LDSC(p,t) :=1− +2 ∑ (i,j)∈Dpijtij + ∑ (i,j)∈Dpij+ tij , (2) and is related to the Jaccard index. Here D = {1 . . N}×{1 . .M} is the domain of the segmen- tation masks and NM is a small scalar regular- ization term. 4.1.WeightingSchemes We propose modifications that improve upon the standard losses Mean Squared Error and Binary Cross-Entropy. The main idea is that not all areas of an image are equally important or equally diffi- cult to segment. Loss functions that are the sum or mean of pixelwise losses can be modified to assign weights to each pixel in order to adjust for this inho- mogeneity. Wecanuseamapwof realweightswith shape equal to t andp and define a modified version ofMeanSquaredError as: LMSE(p,t) := 1 NM ∑ (i,j)∈D wij ( pij− tij )2 . (3) An analogous modification can be made to Binary Cross-Entropy. 117

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik