Seite - 141 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Bild der Seite - 141 -

Text der Seite - 141 -

Synthetic Image Batch Noise Generator Real Image Batch Discriminator Decision Real / Synthetic Fig. 1. Proposed GAN architecture incorporating the segmentation mask in the real and synthetic image batches are especially useful for biomedical segmentation, as they can provide realistic variations of the input data, similar to natural variations. B. Transfer Learning Transfer learning aims to improve the learning of a target task in a target domain, given the learned knowledge of a source task in a source domain [16]. Applied to neu- ral networks, it describes the process of training a source network on a source dataset, followed by transferring the learned features to train a different target network on a target dataset [28]. In the context of small datasets, this can be applied in different ways. It is possible to train on a large dataset, e.g. ImageNet, remove the final layer of the network architecture and fine-tune to a smaller target dataset [19]. A different approach is taken by using Autoencoders, which compress a given image to a vector representation and reconstruct the image from this compressed representation. As an example, denoising Autoencoders [27] have been used to extract robust features with great success. However, transferring Autoencoder features typically requires a target network architecture very similar to the source architecture, which is rarely the case. C. Image Generation A novel approach to tackle the issue of small datasets for training deep learning methods is to synthesize new training data via image generation methods. Recent research has shown that it is possible to render realistic images using 3D models to alleviate the problem of small datasets [22]. This has the advantage of being able to create an unlimited amount of training data of various scenarios, as long as the images are realistic enough. Rendered images have also recently been used to improve the performance of anatomical landmark detection in medical applications by learning on a dataset of rendered 3D models and fine-tuning on medical data [20]. The disadvantage of using rendered images is that the virtual model and scene parameters need to be explicitly defined and tuned towards the application, which is time consuming. Generative Adversarial Networks [4] represent a different approach to image generation. A generator and a discrimi- nator network are trained to compete against each other. The goal of the discriminator is to decide if any given image is real or synthetic. The generator generates synthetic images in the hope of fooling the discriminator. Since the generator never directly sees the training data and only receives its gradients from the discriminator decision, GANs are also resistant to overfitting [3]. However, the training process of GANs is very sensitive to changes in hyperparameters. The problem of finding the Nash Equilibrium between the generator and the discriminator generally leads to an unstable training process, but recent architectures such as DCGAN [18] and WassersteinGAN [2] improved on this substantially. III. METHOD AND ARCHITECTURE Standard GANs either exclusively learn to generate im- ages [4], or learn to perform image transformations [6]. However, in order to use the generated images for other supervised deep learning tasks, like image segmentation, it is also necessary to have a ground-truth solution for any given input image. We propose a modification to the standard GAN archi- tecture, which forces the generator to create segmentation masks in addition to the generated images. The discriminator then has to decide whether an observed image-segmentation- pair is real or synthetic. This forces both the discriminator and generator to implicitly learn about the structure of the ground-truth, making the resulting generated data useful for training in a supervised setup. While it is known that using ground-truth labels in the discriminator improves the image quality [24], this is the first time, to our knowledge, that the ground-truth is used to directly generate new image- segmentation-pairs. Fig. 1 illustrates this architecture. As the foundation for our proposed architecture, we use the DCGAN [18] architecture, which has shown to achieve good results while having increased training stability in many different applications, compared to the previous GAN architectures. DCGAN uses a convolutional generator and discriminator,makesuseofbatchnormalization,andreplaces 141

zurück zum Buch Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics"

Proceedings of the OAGM&ARW Joint Workshop Vision, Automation and Robotics

Titel: Proceedings of the OAGM&ARW Joint Workshop
Untertitel: Vision, Automation and Robotics
Autoren: Peter M. Roth; Markus Vincze; Wilfried Kubinger; Andreas Müller; Bernhard Blaschitz; Svorad Stolc
Verlag: Verlag der Technischen Universität Graz
Ort: Wien
Datum: 2017
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-524-9
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Schlagwörter: Tagungsband
Kategorien: International; Tagungsbände

Seite - 141 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Bild der Seite - 141 -

Text der Seite - 141 -

Inhaltsverzeichnis