Seite - 142 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Bild der Seite - 142 -
Text der Seite - 142 -
all pooling layers with convolutions. The generator takes
a noise vector z as input and feeds it through multiple
fractionally strided convolutions in a fully convolutional
manner to generate synthetic images G(z). The discriminator
receives both real images x and synthetic images G(z), feeds
them through a fully convolutional classification network
which classifies any given image as either real, i.e. D=1, or
synthetic, i.e.D=0.Thediscriminatoruses thecrossentropy
loss function
lD= 1
m m
∑
i=1 [
log (
D (
G (
z(i) )))
+log (
1−D (
x(i) ))]
, (1)
where the mini-batch size m describes the number of training
inputs for stochastic gradient descent [15], i denotes the
current index in the mini-batch, x(i) is the real image, z(i)
is the noise vector sample, D is the discriminator output and
G is the generator output. The generator loss
lG= 1
m m
∑
i=1 log (
1−D (
G (
z(i) )))
(2)
only takes the discriminator output of the generated images
D(G(z)) into account.
By minimizing lG, the generator is trained to generate
images G(z) which look real, i.e. D(G(z))≈1, while by
minimizing lD, the discriminator is trained to correctly clas-
sify realandsynthetic images, i.e.D(x)≈1andD(G(z))≈0.
Therefore, generator and discriminator play against each
other, as the generator creates synthetic images which fool
the discriminator into believing they are real, while the
discriminator attempts to classify real and synthetic images
correctly every time.
In order to implement the additional segmentation mask
generation, the DCGAN architecture was modified to use
2-channel images, where the first channel corresponds to
the image, and the second channel corresponds to the
segmentation mask. The discriminator network then simply
classifies image-segmentation-pairs instead of images only.
The GAN therefore creates synthetic image-segmentation-
pairs, which we then further use for the supervised training
of a segmentation task. For most GAN setups, this change
is simple to implement, as no change in the training process
is necessary, making this adaptation very flexible.
IV. EVALUATION
A. Materials
We evaluate our proposed method using a 3-fold cross-
validation setup on the SCR Lung Database [26], which is
composed of the JSRT Lung Database [25] with correspond-
ing ground-truth segmentation masks. The cross-validation
splits are set up so that all 247 images are tested once,
using 82 test images, and randomly picking 20 validation
images and 145 training images from the remaining images.
The images are downscaled to a resolution of 128x128, on
which all evaluations are performed. In order to demonstrate
possible strengths and limitations of the GAN for even smaller datasets, we evaluate different scenarios on the full
dataset, as well as on a reduced dataset. For the reduced
dataset, the cross-validation setup for test and validation data
is the same as for the full dataset, only the amount of training
data is reduced to 30 images by randomly picking them from
the training images of the full dataset. For the quantitative
evaluation, we chose to perform image segmentation using
the U-Net [21] fully convolutional network architecture.
B. Experimental Setup
For our proposed GAN architecture, we adapted the
DCGAN [18] TensorFlow [1] implementation tf-dcgan1.
We modified the architecture to include support for the
generation of segmentation masks and increased the image
resolution to 128x128. The higher resolution made it neces-
sary to increase the number of generator and discriminator
feature maps. We also used a random noise vector z of
higher dimension as the generator input. The noise vector
dimension was fixed at 400, using uniform noise in the
range of [−1,1]. Generator feature map sizes were set
to [512,256,128,128,128], discriminator feature map sizes
wereset to [128,128,256,512,512].Assuggested in [18], the
convolutional kernel sizes were kept at 5. The weights of all
convolutional layers were initialized randomly using a nor-
mal distribution with zero mean and a standard deviation of
0.05. The input data was scaled to be in the range of [−1,1].
The used optimizer was Adam [9] with a learning rate of
0.0004 and an exponential decay rate for the first and second
moment estimates ofβ1=0.5,β2=0.999. The training was
doneusingamini-batchsizeof128.Thenetworkwas trained
for 12000 mini-batches in total, as after 12000 mini-batches
the overall quality of the generated images G(z) was high
for all cross-validation folds. Samples were generated every
200mini-batchesof training.Toslightly reduce the impactof
ModeCollapse [3],where thegenerator learns tomapseveral
different noise vector inputs z to the same output image
G(z), the resulting GAN images were checked for similarity
by using a perceptual image hash, which removes images
that are almost identical in a batch of samples. Training the
GAN took approximately 24 hours per cross-validation fold
on an Intel i7-6700HQ CPU @ 2.60 GHz and an NVidia
GTX980M GPU with 8 GB of GPU memory.
For the quantitative segmentation results, we used a U-Net
architecture of depth 4, replacing max pooling with average
pooling for downsampling. This U-Net was implemented
using Caffe [7]. Although data augmentation is used to great
effect and is also described as a strength of the U-Net [21],
wedecidednot touse it inanyofourexperiments, inorder to
specifically evaluate the impact the synthetic GAN samples
have on the training process and the resulting segmentation
masks. All convolution kernel sizes were set to 3, with
feature map sizes of 64 and weights initialized using the
MSRA [5] method. We used the Nesterov [14] optimizer at
a learning rate of 0.00001 for the segmentation task, with
a momentum of 0.99 and a weight decay of 0.0005. The
1https://github.com/sugyan/tf-dcgan
142
Proceedings of the OAGM&ARW Joint Workshop
Vision, Automation and Robotics
- Titel
- Proceedings of the OAGM&ARW Joint Workshop
- Untertitel
- Vision, Automation and Robotics
- Autoren
- Peter M. Roth
- Markus Vincze
- Wilfried Kubinger
- Andreas Müller
- Bernhard Blaschitz
- Svorad Stolc
- Verlag
- Verlag der Technischen Universität Graz
- Ort
- Wien
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-524-9
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Schlagwörter
- Tagungsband
- Kategorien
- International
- Tagungsbände