Seite - 142 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Bild der Seite - 142 -

Text der Seite - 142 -

all pooling layers with convolutions. The generator takes a noise vector z as input and feeds it through multiple fractionally strided convolutions in a fully convolutional manner to generate synthetic images G(z). The discriminator receives both real images x and synthetic images G(z), feeds them through a fully convolutional classification network which classifies any given image as either real, i.e. D=1, or synthetic, i.e.D=0.Thediscriminatoruses thecrossentropy loss function lD= 1 m m ∑ i=1 [ log ( D ( G ( z(i) ))) +log ( 1−D ( x(i) ))] , (1) where the mini-batch size m describes the number of training inputs for stochastic gradient descent [15], i denotes the current index in the mini-batch, x(i) is the real image, z(i) is the noise vector sample, D is the discriminator output and G is the generator output. The generator loss lG= 1 m m ∑ i=1 log ( 1−D ( G ( z(i) ))) (2) only takes the discriminator output of the generated images D(G(z)) into account. By minimizing lG, the generator is trained to generate images G(z) which look real, i.e. D(G(z))≈1, while by minimizing lD, the discriminator is trained to correctly clas- sify realandsynthetic images, i.e.D(x)≈1andD(G(z))≈0. Therefore, generator and discriminator play against each other, as the generator creates synthetic images which fool the discriminator into believing they are real, while the discriminator attempts to classify real and synthetic images correctly every time. In order to implement the additional segmentation mask generation, the DCGAN architecture was modified to use 2-channel images, where the first channel corresponds to the image, and the second channel corresponds to the segmentation mask. The discriminator network then simply classifies image-segmentation-pairs instead of images only. The GAN therefore creates synthetic image-segmentation- pairs, which we then further use for the supervised training of a segmentation task. For most GAN setups, this change is simple to implement, as no change in the training process is necessary, making this adaptation very flexible. IV. EVALUATION A. Materials We evaluate our proposed method using a 3-fold cross- validation setup on the SCR Lung Database [26], which is composed of the JSRT Lung Database [25] with correspond- ing ground-truth segmentation masks. The cross-validation splits are set up so that all 247 images are tested once, using 82 test images, and randomly picking 20 validation images and 145 training images from the remaining images. The images are downscaled to a resolution of 128x128, on which all evaluations are performed. In order to demonstrate possible strengths and limitations of the GAN for even smaller datasets, we evaluate different scenarios on the full dataset, as well as on a reduced dataset. For the reduced dataset, the cross-validation setup for test and validation data is the same as for the full dataset, only the amount of training data is reduced to 30 images by randomly picking them from the training images of the full dataset. For the quantitative evaluation, we chose to perform image segmentation using the U-Net [21] fully convolutional network architecture. B. Experimental Setup For our proposed GAN architecture, we adapted the DCGAN [18] TensorFlow [1] implementation tf-dcgan1. We modified the architecture to include support for the generation of segmentation masks and increased the image resolution to 128x128. The higher resolution made it neces- sary to increase the number of generator and discriminator feature maps. We also used a random noise vector z of higher dimension as the generator input. The noise vector dimension was fixed at 400, using uniform noise in the range of [−1,1]. Generator feature map sizes were set to [512,256,128,128,128], discriminator feature map sizes wereset to [128,128,256,512,512].Assuggested in [18], the convolutional kernel sizes were kept at 5. The weights of all convolutional layers were initialized randomly using a nor- mal distribution with zero mean and a standard deviation of 0.05. The input data was scaled to be in the range of [−1,1]. The used optimizer was Adam [9] with a learning rate of 0.0004 and an exponential decay rate for the first and second moment estimates ofβ1=0.5,β2=0.999. The training was doneusingamini-batchsizeof128.Thenetworkwas trained for 12000 mini-batches in total, as after 12000 mini-batches the overall quality of the generated images G(z) was high for all cross-validation folds. Samples were generated every 200mini-batchesof training.Toslightly reduce the impactof ModeCollapse [3],where thegenerator learns tomapseveral different noise vector inputs z to the same output image G(z), the resulting GAN images were checked for similarity by using a perceptual image hash, which removes images that are almost identical in a batch of samples. Training the GAN took approximately 24 hours per cross-validation fold on an Intel i7-6700HQ CPU @ 2.60 GHz and an NVidia GTX980M GPU with 8 GB of GPU memory. For the quantitative segmentation results, we used a U-Net architecture of depth 4, replacing max pooling with average pooling for downsampling. This U-Net was implemented using Caffe [7]. Although data augmentation is used to great effect and is also described as a strength of the U-Net [21], wedecidednot touse it inanyofourexperiments, inorder to specifically evaluate the impact the synthetic GAN samples have on the training process and the resulting segmentation masks. All convolution kernel sizes were set to 3, with feature map sizes of 64 and weights initialized using the MSRA [5] method. We used the Nesterov [14] optimizer at a learning rate of 0.00001 for the segmentation task, with a momentum of 0.99 and a weight decay of 0.0005. The 1https://github.com/sugyan/tf-dcgan 142

zurück zum Buch Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics"

Proceedings of the OAGM&ARW Joint Workshop Vision, Automation and Robotics

Titel: Proceedings of the OAGM&ARW Joint Workshop
Untertitel: Vision, Automation and Robotics
Autoren: Peter M. Roth; Markus Vincze; Wilfried Kubinger; Andreas Müller; Bernhard Blaschitz; Svorad Stolc
Verlag: Verlag der Technischen Universität Graz
Ort: Wien
Datum: 2017
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-524-9
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Schlagwörter: Tagungsband
Kategorien: International; Tagungsbände

Seite - 142 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Bild der Seite - 142 -

Text der Seite - 142 -

Inhaltsverzeichnis