Seite - 37 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Bild der Seite - 37 -

Text der Seite - 37 -

networks have already been successfully applied to the problem of scene labelling [7] and semantic segmentation [18]. In contrast to related work, in this paper we take a bottom-up approach. Our CNN-based model operates at the level of small image patches and enables classifying each patch as either belonging to a tattoo or not. Our approach can be used on arbitrary images to obtain a low-level estimate of candidate tattooed regions. Weproposethisapproachwithourtargetapplicationofde-identificationinmind. Inade-identification pipeline, thedetectedcandidate tattooregionscanberemovedoraveragedtoremovepersonally iden- tifyinginformation. Weplacemuchgreater importanceoncorrectlydetectingall tattooedregionsthan on eliminating false positive detections, as false positives can be eliminated in subsequent stages, e.g. bycombining ourmethod withapersondetector (e.g. [5]). 3. Ourmethod Our proposed method for tattoo detection is based on image patch labeling using a convolutional neural network. We do not detect a tattoo as a global entity. Rather, we use the sliding window approach and at each window position we extract a patch of the sizeN×N. The patch is then classifiedaseither tattooorbackground. Theoutputofourmethodconsistsofmasked imageregions that are tattoocandidates. Convolutional neural networks typically consist of several convolutional layers, followed by one or morefullyconnectedlayers. Convolutional layersare inchargeof learninggoodfeaturesandtheyare characterizedby(i) local receptivefields(i.e. theneuronin theconvolutional layer isnotconnectedto theoutputsofall theneurons fromtheprevious layer, butonly to theones in its localneighborhood), and (ii) shared weights, reflecting the intuition that the features are computed in the same way at different image locations. After the convolutional layers, the so-called pooling layers are typically inserted in order to reduce the dimensionality of feature space for subsequent steps. Fully connected layersperformthe taskofclassificationandcontain themajorityof learnedweights. The architecture of our network is broadly inspired by the successful VGGNet model, proposed in 2014 by Simonyan and Zisserman [24]. The VGGNet is characterized by a very homogeneous ar- chitecture that only performs 3×3 convolutions and 2×2 pooling from the beginning to the end. However, our model modifies it to accommodate smaller input images and smaller number of output classes. Thesimplifiednetwork,withfewerandsmaller layers is faster to trainanditprovedadequate forourpurposes. Theproposed networkarchitecture is shown inFig.1. The input to the network is anN×N color image (we assumed the RGB color model). The image has tobeclassifiedeitherasbelonging to the tattooornot,dependingonwhether itscenter lies inside thepolygon thatdemarcates the tattoo. The network consists of eight layers (not counting the input layer, i.e. the image itself). The first two layersareconvolutional layerswith32featuremapswith3×3filtersandReLUactivationunits. The third layer is a max-pooling layer that reduces the feature map dimensions by2×2. The fourth and the fifth layers are again convolutional layers with ReLU activation units, but with 64 feature maps (againwith3×3filters). Thesixth layer is anothermax-pooling layer, oncemore reducing the input dimensionby2×2. Theseventh layer isa fullyconnected layerconsistingof256neurons. Thefinal, 37

zurück zum Buch Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“"

Seite - 37 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Bild der Seite - 37 -

Text der Seite - 37 -

Inhaltsverzeichnis