Seite - 113 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Bild der Seite - 113 -
Text der Seite - 113 -
one II one II direct diff
0
5
10
15
2.09 1.97
7.99 11.68
9.64 16.01
scalar SSE AVX
(a) Details about performance if algorithmic optimization
andvectorizationis applied to discrepancynorm. 32 48 64 96 128160192224256512
1
2
3
4
image size in pixel (𝑙 × 𝑙 )
serial
SSE
AVX
(b) Details about performance if vectorization is applied
to integral image computation with the OpenCV algorithm
servingas references.
Figure3. Resultsof performanceevaluation tests
4. Performance Analysisand Evaluation
Coding is done with C++, whereas Visual Studio 2013 from Microsoft severs as the compiler. The
onlyadjustment is thesettingEnableEnhancedInstructionSet in thegroupofCodeGeneration. The
selected targetarchitecture is64-bit. The test systemisbasedonanIntel i5-4460. Thecomputer runs
Windows 7 Professional Service Pack 1 64-bit. The test algorithm applies the discrepancy norm in a
sliding window approach, that the implementation is executed many times. Furthermore, the whole
test setup is run several times to eliminate random influences. As we measure similarity compared
to a pattern, reference subtraction has to be applied for each window. We include this step in time
measurementas it isvital for this taskandcan notbeomitted.
Figure3asummarizes thespeedupwith the testsetup. Thedirectdifferenceapproachoutperformsthe
other implementations by far. With the AVX vectorization leading to a speedup of 16 and SSE vec-
torization toaspeedupof12. Thealgorithmicoptimizedserialversionalreadydoubledperformance.
Theaverageexecutiontimeof theAVXversionis0.612secondsmatchinga64×64datapatchwithin
a 512×512 image. AVX can process eight 32-bit integer values at the same time, which is exactly
the speedup gained by the vectorization compared with the serial version. On the other hand, SSE
producessuper linearspeedupexceedingtheoreticalmaximum. Thedata indicates that referencesub-
traction has a big impact on the runtime. Embedding the difference building in the algorithm itself
improved theperformanceabout50% for SSEand65% forAVX.
Another comparison concentrates on the vectorization of the integral image algorithm alone. Many
common algorithms like SURF are based on this intermediate representation [22]. The OpenCV
implementation for integral image is compared to the vectorized implementation of the authors and
a straight forward serial version. Figure 3b shows the results. The OpenCV algorithm serves as the
reference and is twice as fast as a simple serial implementation. This suggests that OpenCV uses the
approach from [13] that requires extra storage but reduces the necessary additions. Nonetheless, the
vectorized version outperforms OpenCV at any image size, gaining a speedup of2.5 to4, depending
on the imagesize.
Forembeddedapplications it is interestingwhether thevectorizationschemeisapplicable tootherar-
chitectures, too. In embedded computing the ARM architecture plays a crucial role. Here, the ARM
Cortex-AseriesoffersSIMDcapabilitieswithNEON technologyofferingadatawidthequal toSSE.
[23]. All in all, the whole vectorization can be coded with the NEON instructions. Unfortunately,
113
Proceedings
OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Titel
- Proceedings
- Untertitel
- OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Autoren
- Peter M. Roth
- Kurt Niel
- Verlag
- Verlag der Technischen Universität Graz
- Ort
- Wels
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-527-0
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 248
- Schlagwörter
- Tagungsband
- Kategorien
- International
- Tagungsbände