Page - 94 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Image of the Page - 94 -

Text of the Page - 94 -

0 1 Fig. 4. One of the logo pairs in the USTTAB majority ground truth not being part of the strict set. 0 1 Fig. 5. One of the logo pairs in the USTTAB minority ground truth not being part of the strict set or the majority set. C. Data Format The dataset is defined in multiple text files. The first file, data full.txt, contains the registration number of all trademarks used as diversifiers as well as all trademarks from the ground truth. Each line contains one number. The files data 10.txt anddata 1.txt containsa10%and the1%random sample in the same format for test on smaller data sets, while still providing comparability. The ground truth is available in the folder groundtruth. This folder contains the files gt strict.txt, gt majority.txt and gt minority.txt, which hold a comma separated list of trademark registration numbers identifying the visually similar logo pairs. III. RETRIEVAL BASELINE To provide a baseline for comparison, several state of the art algorithms were tested on the new dataset. The tests were executed with a benchmark software based on LIRE [19], which was presented in [16]. Note at this point that all descriptors used in the test as well as the benchmarking suite have been contributed to the LIRE open source project5. A. Tested Features The following features were chosen to be tested on the new dataset because they not only cover a wide diversity of features types like color, shape, texture and combinations of them, but also because some of them were proposed as well suited in the trademark retrieval domain [2]. Local Binary Patterns (LBP) [11] represent the local texture of an image by encoding the threshold of each pixel’s neighborhood in a binary number. A rotation invariant version can be achieved by restricting the observed patterns the so-called uniform patterns. For Binary Patterns Pyramid (BPP) a 5http://www.lire-project.net/, last visited 2016-01-19 spatial pyramid was applied on the LBP. The Shapeme Histogram Descriptor (Shapeme) captures the global shape of an image by extracting the shape context and clustering with K-nearest neighbors. In this experiment, the shape contexts were calculated for 256 points chosen by Jitandta’s algorithm with three time oversampling and 512 bins for the descriptor [3]. Centrist is a feature similar to LBP and alsocaptures local texture.JointCompositeDescriptor (JCD) [29] combines the two fuzzy histogram features Color and Edge Directivity Descriptor [8] and Fuzzy Color and Texture Histogram [6]. Adaptive Contours and Color Integration Descriptor (ACCID) [12] captures visually salient shapes and combines them with a fuzzy color histogram. Pyramid Histogram of Oriented Gradients (PHOG) [4] extracts in- formation about the local shape and the layout of the shape with a with a Spatial Pyramid Kernel. In this experiment, 15 orientation bins were used as that has been found effective in the context of trademark retrieval (PHOG15, cp. [16]). For the evaluation, the logos were resized to a maximum width and height of 512 pixel retaining aspect ratio. In an additional preprocessing step, a despeckle filter was applied and the white pixels were trimmed. Table III-A shows the result of the outlined features on the full USTTAB dataset utilizing the strict ground truth. As can be seen easily from Table III-A, PHOG15 outperforms the other descriptors regarding recall and mean average precision. In terms of average and normalized rank, the Shapeme feature performs better than PHOG. Fig.6shows thecomparisonof themeanaverageprecision (MAP) for PHOG15, Shapeme, ACCID, JCD, BBP, and Centrist on the three different ground truths. For Shapeme and PHOG15, the MAP correlates to the agreement of the experts. The less agreement in the ground truth, the lower the MAP. IV. CONCLUSION AND CHALLENGES The data set as presented provides a hard challenge to researchers in visual information retrieval. While the data from the USTAB trials provides pairs of trademarks with confusing similarity, for both of the pairs it is very likely to find numerous visually similar other logos, which were not part of a trial. Moreover, companies often file trademarks in different version, re-register them or have multiple data records in the USTAB registration data base. Fig. 7 shows an example result list from searching for a visual trademark from the ground truth. At position 0 the query is shown and only on position 49 of the list the offending trademark is found. However, it can be easily seen that the logos in between are visually similar to the trial’s logo pair. While this is definitely a problem for a common use case like digital photo retrieval, in the visual trademark domain the experts doing inquiries certainly go beyond the first few results and finding the offending logo in the first 100 or even 500 results helps them with their work. Note also at this point that thedata set is especiallyaboutconfusingsimilarity, not near duplicate search, as the latter one has been subject to a lot of research already. Therefore, for future work we 94

back to the book Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics"

Proceedings of the OAGM&ARW Joint Workshop Vision, Automation and Robotics

Title: Proceedings of the OAGM&ARW Joint Workshop
Subtitle: Vision, Automation and Robotics
Authors: Peter M. Roth; Markus Vincze; Wilfried Kubinger; Andreas Müller; Bernhard Blaschitz; Svorad Stolc
Publisher: Verlag der Technischen Universität Graz
Location: Wien
Date: 2017
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-524-9
Size: 21.0 x 29.7 cm
Pages: 188
Keywords: Tagungsband
Categories: International; Tagungsbände

Page - 94 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Image of the Page - 94 -

Text of the Page - 94 -

Table of contents