Seite - 141 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Bild der Seite - 141 -
Text der Seite - 141 -
Synthetic Image Batch
Noise Generator Real Image Batch
Discriminator
Decision
Real / Synthetic
Fig. 1. Proposed GAN architecture incorporating the segmentation mask in the real and synthetic image batches
are especially useful for biomedical segmentation, as they
can provide realistic variations of the input data, similar to
natural variations.
B. Transfer Learning
Transfer learning aims to improve the learning of a target
task in a target domain, given the learned knowledge of
a source task in a source domain [16]. Applied to neu-
ral networks, it describes the process of training a source
network on a source dataset, followed by transferring the
learned features to train a different target network on a target
dataset [28]. In the context of small datasets, this can be
applied in different ways. It is possible to train on a large
dataset, e.g. ImageNet, remove the final layer of the network
architecture and fine-tune to a smaller target dataset [19]. A
different approach is taken by using Autoencoders, which
compress a given image to a vector representation and
reconstruct the image from this compressed representation.
As an example, denoising Autoencoders [27] have been
used to extract robust features with great success. However,
transferring Autoencoder features typically requires a target
network architecture very similar to the source architecture,
which is rarely the case.
C. Image Generation
A novel approach to tackle the issue of small datasets
for training deep learning methods is to synthesize new
training data via image generation methods. Recent research
has shown that it is possible to render realistic images using
3D models to alleviate the problem of small datasets [22].
This has the advantage of being able to create an unlimited
amount of training data of various scenarios, as long as
the images are realistic enough. Rendered images have also
recently been used to improve the performance of anatomical
landmark detection in medical applications by learning on a
dataset of rendered 3D models and fine-tuning on medical
data [20]. The disadvantage of using rendered images is that
the virtual model and scene parameters need to be explicitly
defined and tuned towards the application, which is time
consuming. Generative Adversarial Networks [4] represent a different
approach to image generation. A generator and a discrimi-
nator network are trained to compete against each other. The
goal of the discriminator is to decide if any given image is
real or synthetic. The generator generates synthetic images
in the hope of fooling the discriminator. Since the generator
never directly sees the training data and only receives its
gradients from the discriminator decision, GANs are also
resistant to overfitting [3]. However, the training process
of GANs is very sensitive to changes in hyperparameters.
The problem of finding the Nash Equilibrium between
the generator and the discriminator generally leads to an
unstable training process, but recent architectures such as
DCGAN [18] and WassersteinGAN [2] improved on this
substantially.
III. METHOD AND ARCHITECTURE
Standard GANs either exclusively learn to generate im-
ages [4], or learn to perform image transformations [6].
However, in order to use the generated images for other
supervised deep learning tasks, like image segmentation, it is
also necessary to have a ground-truth solution for any given
input image.
We propose a modification to the standard GAN archi-
tecture, which forces the generator to create segmentation
masks in addition to the generated images. The discriminator
then has to decide whether an observed image-segmentation-
pair is real or synthetic. This forces both the discriminator
and generator to implicitly learn about the structure of the
ground-truth, making the resulting generated data useful for
training in a supervised setup. While it is known that using
ground-truth labels in the discriminator improves the image
quality [24], this is the first time, to our knowledge, that
the ground-truth is used to directly generate new image-
segmentation-pairs. Fig. 1 illustrates this architecture.
As the foundation for our proposed architecture, we use
the DCGAN [18] architecture, which has shown to achieve
good results while having increased training stability in
many different applications, compared to the previous GAN
architectures. DCGAN uses a convolutional generator and
discriminator,makesuseofbatchnormalization,andreplaces
141
Proceedings of the OAGM&ARW Joint Workshop
Vision, Automation and Robotics
- Titel
- Proceedings of the OAGM&ARW Joint Workshop
- Untertitel
- Vision, Automation and Robotics
- Autoren
- Peter M. Roth
- Markus Vincze
- Wilfried Kubinger
- Andreas Müller
- Bernhard Blaschitz
- Svorad Stolc
- Verlag
- Verlag der Technischen Universität Graz
- Ort
- Wien
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-524-9
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Schlagwörter
- Tagungsband
- Kategorien
- International
- Tagungsbände