Seite - 109 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 109 -

Text der Seite - 109 -

Ourcontributionsare: •We propose a Trilinear Interpolation Layer suited for creating equivariant feature spaces in SO(3). •We provide quantitative and qualitative evi- dence for the advantage of equivariant feature spaces by predicting unseen views inSO(3)of objects fromtheLineMODdataset [4]. The remainder of the paper is structured as fol- lows. Section 2 reviews related work. In Section 3 wedescribeourapproach. Section4presentsourex- perimental results. Finally, Section 5 concludes the paper. 2.RelatedWork Object pose refiners rely on the availability of prior stages to produce pose hypotheses [7, 10, 12, 16, 18, 20, 21]. When depth data is available, the Iterative-Closest-Point algorithm (ICP) can be used to refine initial pose estimates [18, 7, 20]. Recent RGB-based approaches do not rely on the availabil- ity of depth data for pose refinement [7, 10, 12, 16, 21]. CNN-basedobjectposerefinementarchitectures such as [10, 12, 21] pass two input images to the network in order to estimate the relative rotation be- tween these. These images are an observation of the object in thedesiredposeanda renderingof thepre- diction. In [10] the authors base their network archi- tectureonanapproachforopticalflowestimation[1] andpredictopticalflow,maskandrelativeposedevi- ation inSE(3). Theauthorsof [21]useasimilarap- proach with two encoders, one per input image. The encoders’outputsaresubtractedandfurtherencoded to predict the refined pose in SE(3). We present a concept suitable for enhancing such methods by guiding the network to learn an equivariant feature space. TheSTNintroducedby[6] iswidelyusedfor fea- ture and image space transformation [2, 13, 14, 15, 19]. It consists of the combination of a localiza- tion network, a grid generator and a sampler.. The authors of [2] apply STL to properly align the fea- tures to their inputs. In [13] the authors predict deep heatmaps from randomly sampled object patches to predict poses under occlusion. They apply the STL to upsample their predictions. In [14, 15] an analog of the localizationnetwork isused toproducefeature maps invariant to input transformations. The authors Figure 2: Encoder-decoder architecture for image synthesis. of [11] leverage on the methodology of STN to gen- erate realistic looking images from the intersection of the natural image and geometric manifold, using an adapted Generative Adversarial Network. Con- versely to theseapproacheswemodify theSTLcom- ponent of STN to enableSO(3) transformations of input featuremapswith spatial dimension. 3.Approach This section presents our approach for learning equivarient features inSO(3) in order to synthesize imagesfromunseenviewpoints. Wefirstgiveaprob- lem definition, then describe the Trilinear Interpola- tionLayer. Finally,weoutlinehowtheTILisusedin anencoder-decoderarchitecture for imagesynthesis. 3.1.ProblemStatement LetX= { xc, ( x˜0θ0, ...,x˜ n θi )} be a set of training examples wherexc refers to the projection ∏ of ob- ject oc, in its canonical pose, to the image space I. The set of x˜n θi are the projections of transformed objects oθi where θi represent the transformation in SO(3) for the projection into I. Our goal is to learn the inverse of the mapping function ∏−1 in order to producetransformedimages. Inotherwords, to learn x˜n θi = ∏[∏−1(xc),θi] given an image of the ob- ject in its canonical pose and transformation param- eters. In order to model the inversion of the mapping function ∏ , we utilize a CNN due to their power to encode statistical relationships from visual data into feature spaces [8]. To provide information regarding relative transformations θi in SO(3) between pairs of images to our model, we modify the STL of [6]. An overview of the encoder-decoder architecture for imagesynthesisusing themodifiedSTLispresented inFigure2. 3.2.TrilinearInterpolation The STL [6] allowsSE(2) transformations to be applied to feature maps. This works well in image 109

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik