Page - 37 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Image of the Page - 37 -
Text of the Page - 37 -
networks have already been successfully applied to the problem of scene labelling [7] and semantic
segmentation [18].
In contrast to related work, in this paper we take a bottom-up approach. Our CNN-based model
operates at the level of small image patches and enables classifying each patch as either belonging
to a tattoo or not. Our approach can be used on arbitrary images to obtain a low-level estimate of
candidate tattooed regions.
Weproposethisapproachwithourtargetapplicationofde-identificationinmind. Inade-identification
pipeline, thedetectedcandidate tattooregionscanberemovedoraveragedtoremovepersonally iden-
tifyinginformation. Weplacemuchgreater importanceoncorrectlydetectingall tattooedregionsthan
on eliminating false positive detections, as false positives can be eliminated in subsequent stages,
e.g. bycombining ourmethod withapersondetector (e.g. [5]).
3. Ourmethod
Our proposed method for tattoo detection is based on image patch labeling using a convolutional
neural network. We do not detect a tattoo as a global entity. Rather, we use the sliding window
approach and at each window position we extract a patch of the sizeN×N. The patch is then
classifiedaseither tattooorbackground. Theoutputofourmethodconsistsofmasked imageregions
that are tattoocandidates.
Convolutional neural networks typically consist of several convolutional layers, followed by one or
morefullyconnectedlayers. Convolutional layersare inchargeof learninggoodfeaturesandtheyare
characterizedby(i) local receptivefields(i.e. theneuronin theconvolutional layer isnotconnectedto
theoutputsofall theneurons fromtheprevious layer, butonly to theones in its localneighborhood),
and (ii) shared weights, reflecting the intuition that the features are computed in the same way at
different image locations. After the convolutional layers, the so-called pooling layers are typically
inserted in order to reduce the dimensionality of feature space for subsequent steps. Fully connected
layersperformthe taskofclassificationandcontain themajorityof learnedweights.
The architecture of our network is broadly inspired by the successful VGGNet model, proposed in
2014 by Simonyan and Zisserman [24]. The VGGNet is characterized by a very homogeneous ar-
chitecture that only performs 3×3 convolutions and 2×2 pooling from the beginning to the end.
However, our model modifies it to accommodate smaller input images and smaller number of output
classes. Thesimplifiednetwork,withfewerandsmaller layers is faster to trainanditprovedadequate
forourpurposes. Theproposed networkarchitecture is shown inFig.1.
The input to the network is anN×N color image (we assumed the RGB color model). The image
has tobeclassifiedeitherasbelonging to the tattooornot,dependingonwhether itscenter lies inside
thepolygon thatdemarcates the tattoo.
The network consists of eight layers (not counting the input layer, i.e. the image itself). The first two
layersareconvolutional layerswith32featuremapswith3×3filtersandReLUactivationunits. The
third layer is a max-pooling layer that reduces the feature map dimensions by2×2. The fourth and
the fifth layers are again convolutional layers with ReLU activation units, but with 64 feature maps
(againwith3×3filters). Thesixth layer is anothermax-pooling layer, oncemore reducing the input
dimensionby2×2. Theseventh layer isa fullyconnected layerconsistingof256neurons. Thefinal,
37
Proceedings
OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Title
- Proceedings
- Subtitle
- OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Authors
- Peter M. Roth
- Kurt Niel
- Publisher
- Verlag der Technischen Universität Graz
- Location
- Wels
- Date
- 2017
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-527-0
- Size
- 21.0 x 29.7 cm
- Pages
- 248
- Keywords
- Tagungsband
- Categories
- International
- Tagungsbände