Page - 94 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Image of the Page - 94 -
Text of the Page - 94 -
0 1
Fig. 4. One of the logo pairs in the USTTAB majority ground truth not
being part of the strict set.
0 1
Fig. 5. One of the logo pairs in the USTTAB minority ground truth not
being part of the strict set or the majority set.
C. Data Format
The dataset is defined in multiple text files. The first
file, data full.txt, contains the registration number of all
trademarks used as diversifiers as well as all trademarks from
the ground truth. Each line contains one number. The files
data 10.txt anddata 1.txt containsa10%and the1%random
sample in the same format for test on smaller data sets, while
still providing comparability. The ground truth is available
in the folder groundtruth. This folder contains the files
gt strict.txt, gt majority.txt and gt minority.txt, which hold
a comma separated list of trademark registration numbers
identifying the visually similar logo pairs.
III. RETRIEVAL BASELINE
To provide a baseline for comparison, several state of the
art algorithms were tested on the new dataset. The tests were
executed with a benchmark software based on LIRE [19],
which was presented in [16]. Note at this point that all
descriptors used in the test as well as the benchmarking suite
have been contributed to the LIRE open source project5.
A. Tested Features
The following features were chosen to be tested on the
new dataset because they not only cover a wide diversity of
features types like color, shape, texture and combinations of
them, but also because some of them were proposed as well
suited in the trademark retrieval domain [2]. Local Binary
Patterns (LBP) [11] represent the local texture of an image
by encoding the threshold of each pixel’s neighborhood
in a binary number. A rotation invariant version can be
achieved by restricting the observed patterns the so-called
uniform patterns. For Binary Patterns Pyramid (BPP) a
5http://www.lire-project.net/, last visited 2016-01-19 spatial pyramid was applied on the LBP. The Shapeme
Histogram Descriptor (Shapeme) captures the global shape
of an image by extracting the shape context and clustering
with K-nearest neighbors. In this experiment, the shape
contexts were calculated for 256 points chosen by Jitandta’s
algorithm with three time oversampling and 512 bins for
the descriptor [3]. Centrist is a feature similar to LBP and
alsocaptures local texture.JointCompositeDescriptor (JCD)
[29] combines the two fuzzy histogram features Color and
Edge Directivity Descriptor [8] and Fuzzy Color and Texture
Histogram [6]. Adaptive Contours and Color Integration
Descriptor (ACCID) [12] captures visually salient shapes
and combines them with a fuzzy color histogram. Pyramid
Histogram of Oriented Gradients (PHOG) [4] extracts in-
formation about the local shape and the layout of the shape
with a with a Spatial Pyramid Kernel. In this experiment, 15
orientation bins were used as that has been found effective
in the context of trademark retrieval (PHOG15, cp. [16]).
For the evaluation, the logos were resized to a maximum
width and height of 512 pixel retaining aspect ratio. In an
additional preprocessing step, a despeckle filter was applied
and the white pixels were trimmed. Table III-A shows the
result of the outlined features on the full USTTAB dataset
utilizing the strict ground truth. As can be seen easily
from Table III-A, PHOG15 outperforms the other descriptors
regarding recall and mean average precision. In terms of
average and normalized rank, the Shapeme feature performs
better than PHOG.
Fig.6shows thecomparisonof themeanaverageprecision
(MAP) for PHOG15, Shapeme, ACCID, JCD, BBP, and
Centrist on the three different ground truths. For Shapeme
and PHOG15, the MAP correlates to the agreement of the
experts. The less agreement in the ground truth, the lower
the MAP.
IV. CONCLUSION AND CHALLENGES
The data set as presented provides a hard challenge to
researchers in visual information retrieval. While the data
from the USTAB trials provides pairs of trademarks with
confusing similarity, for both of the pairs it is very likely to
find numerous visually similar other logos, which were not
part of a trial. Moreover, companies often file trademarks
in different version, re-register them or have multiple data
records in the USTAB registration data base. Fig. 7 shows
an example result list from searching for a visual trademark
from the ground truth. At position 0 the query is shown
and only on position 49 of the list the offending trademark
is found. However, it can be easily seen that the logos in
between are visually similar to the trial’s logo pair.
While this is definitely a problem for a common use case
like digital photo retrieval, in the visual trademark domain
the experts doing inquiries certainly go beyond the first few
results and finding the offending logo in the first 100 or
even 500 results helps them with their work. Note also at this
point that thedata set is especiallyaboutconfusingsimilarity,
not near duplicate search, as the latter one has been subject
to a lot of research already. Therefore, for future work we
94
Proceedings of the OAGM&ARW Joint Workshop
Vision, Automation and Robotics
- Title
- Proceedings of the OAGM&ARW Joint Workshop
- Subtitle
- Vision, Automation and Robotics
- Authors
- Peter M. Roth
- Markus Vincze
- Wilfried Kubinger
- Andreas Müller
- Bernhard Blaschitz
- Svorad Stolc
- Publisher
- Verlag der Technischen Universität Graz
- Location
- Wien
- Date
- 2017
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-524-9
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Keywords
- Tagungsband
- Categories
- International
- Tagungsbände