Seite - 110 - in Proceedings - OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
Bild der Seite - 110 -
Text der Seite - 110 -
Passes Additions Comparisons
Ref 4 12n ·m 8n ·m
Opt 2 5n ·m+2m 6n ·m+4m
Table1. Comparing thenumberofoperations for thereferenceand theoptimized version.
factor is storage, given the fact that both versions require additional storage ofn ·m. But the base
version writes four times to this area, whereas the optimized version only once. If this storage area
is accessible to the user, calculating the discrepancy norm in 2D always yields the according integral
image for free.
3. Parallelization
The proposed algorithm seems to be well-suited for parallelization methods. Computing and com-
paring theothercomponentsof thediscrepancynormishighly independent. When it comes toparal-
lelization,moderncomputersoffervariousoptions. Acommonclassification in this areacomes from
[15]. The classification is based on the number of parallel instruction and data streams. A traditional
processorbelongs toSISD,whereasmulti coreormultiprocessor systemsareMIMD.Instructionset
extensions like SSE andAVX,also referred toasvectorunits, belong toSIMD.
Asimilaritymeasure like the discrepancy normwill normallybeapplied many times. Pattern match-
ingrequiresevaluating thediscrepancynormatmanydifferentpositionsofapatch. Therefore,SIMD
is a promising approach. It is especially suitable for applying the same kind of operation to several
data values at once. Furthermore, SIMD means choosing certain special instructions. At runtime,
they do not have any overhead, compared to normal SISD instructions. On the other hand, making
use of multiprocessing would lead to an overhead due to the fact that it involves spanning threads,
distributing data and synchronizing at the end. As shown by [16], using multi core processors is
complex. On the one hand, the work succeeded in using multiple cores to improve performance. On
the other, hand the processor topology has an impact. The authors had to bind the threads to cores
sharing thesameL2cache inorder to improveperformance. Not fulfilling this requirement results in
a significant performancepenalty.
SIMD instructions operate on a dataset or a so called vector. For example, a traditional add would
perform a := a+ b. The SIMD version of this instruction would perform the same operation, but
awould be a vector. Typical vector sizes of SIMD units range from 2 to 8 elements. Normally,
vector units have registers of a fixed size. Depending on the size of the data type, they can process
a certain amount of elements in one step. Vector units are not designed to operate horizontally,
whichwouldmeancombiningelementswithinavector register. Wewill concentrateon thecommon
SIMD extensions for the x86/x64 architecture. There are two extensions in this area: SSE and AVX
- both exist in different versions, with each new version extending the previous one by adding new
computingcapabilities [17].
AVX doubled the vector size comparesd to SSE. Yet, in terms of data shuffling, the situation became
much more complex: with vector registers and operations split into lanes, one AVX register consists
of two 128-bit lanes which simplified implementing the architecture for the designers. It does not
make any difference for vector operations like additions. Nevertheless, for instance, the SSE shuffle
operation takes an immediate value that allows indexing of up to four elements. The according AVX
110
Proceedings
OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Titel
- Proceedings
- Untertitel
- OAGM & ARW Joint Workshop 2016 on "Computer Vision and Robotics“
- Autoren
- Peter M. Roth
- Kurt Niel
- Verlag
- Verlag der Technischen Universität Graz
- Ort
- Wels
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-527-0
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 248
- Schlagwörter
- Tagungsband
- Kategorien
- International
- Tagungsbände