Seite - 113 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Bild der Seite - 113 -

Text der Seite - 113 -

one II one II direct diff 0 5 10 15 2.09 1.97 7.99 11.68 9.64 16.01 scalar SSE AVX (a) Details about performance if algorithmic optimization andvectorizationis applied to discrepancynorm. 32 48 64 96 128160192224256512 1 2 3 4 image size in pixel (𝑙 × 𝑙 ) serial SSE AVX (b) Details about performance if vectorization is applied to integral image computation with the OpenCV algorithm servingas references. Figure3. Resultsof performanceevaluation tests 4. Performance Analysisand Evaluation Coding is done with C++, whereas Visual Studio 2013 from Microsoft severs as the compiler. The onlyadjustment is thesettingEnableEnhancedInstructionSet in thegroupofCodeGeneration. The selected targetarchitecture is64-bit. The test systemisbasedonanIntel i5-4460. Thecomputer runs Windows 7 Professional Service Pack 1 64-bit. The test algorithm applies the discrepancy norm in a sliding window approach, that the implementation is executed many times. Furthermore, the whole test setup is run several times to eliminate random influences. As we measure similarity compared to a pattern, reference subtraction has to be applied for each window. We include this step in time measurementas it isvital for this taskandcan notbeomitted. Figure3asummarizes thespeedupwith the testsetup. Thedirectdifferenceapproachoutperformsthe other implementations by far. With the AVX vectorization leading to a speedup of 16 and SSE vec- torization toaspeedupof12. Thealgorithmicoptimizedserialversionalreadydoubledperformance. Theaverageexecutiontimeof theAVXversionis0.612secondsmatchinga64×64datapatchwithin a 512×512 image. AVX can process eight 32-bit integer values at the same time, which is exactly the speedup gained by the vectorization compared with the serial version. On the other hand, SSE producessuper linearspeedupexceedingtheoreticalmaximum. Thedata indicates that referencesub- traction has a big impact on the runtime. Embedding the difference building in the algorithm itself improved theperformanceabout50% for SSEand65% forAVX. Another comparison concentrates on the vectorization of the integral image algorithm alone. Many common algorithms like SURF are based on this intermediate representation [22]. The OpenCV implementation for integral image is compared to the vectorized implementation of the authors and a straight forward serial version. Figure 3b shows the results. The OpenCV algorithm serves as the reference and is twice as fast as a simple serial implementation. This suggests that OpenCV uses the approach from [13] that requires extra storage but reduces the necessary additions. Nonetheless, the vectorized version outperforms OpenCV at any image size, gaining a speedup of2.5 to4, depending on the imagesize. Forembeddedapplications it is interestingwhether thevectorizationschemeisapplicable tootherar- chitectures, too. In embedded computing the ARM architecture plays a crucial role. Here, the ARM Cortex-AseriesoffersSIMDcapabilitieswithNEON technologyofferingadatawidthequal toSSE. [23]. All in all, the whole vectorization can be coded with the NEON instructions. Unfortunately, 113

zurück zum Buch Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“"

Proceedings OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Titel: Proceedings
Untertitel: OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Autoren: Peter M. Roth; Kurt Niel
Verlag: Verlag der Technischen Universität Graz
Ort: Wels
Datum: 2017
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-527-0
Abmessungen: 21.0 x 29.7 cm
Seiten: 248
Schlagwörter: Tagungsband
Kategorien: International; Tagungsbände

Seite - 113 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“

Bild der Seite - 113 -

Text der Seite - 113 -

Inhaltsverzeichnis