Seite - 112 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 112 -

Text der Seite - 112 -

Figure 4: View synthesis fromSO(3) transformations unseen during training time. First row: reconstructed Lamp with varying azimuth from -43◦ to 43◦. Second row: reconstructed Glue with elevation variation from - 43◦ to43◦. Rowthree tofive: objectsBenchvise,Camera,Cat reconstructedwithazimuth/elevationrangefrom (-43◦,-43◦) to (43◦,43◦). Object poses outside the green box are samples out of training distribution. Centered images, in the redbox,mark thecanonicalposes. 0 25 50 75 100 125 150 175 Angle in ∘ 0∘02 0∘04 0∘06 0∘08 0∘10 0∘12 0∘14 0∘16 MAE MSE DSSIM Figure 5: Error values and its variance over azimuth angle. The network was trained on its corresponding loss function with a spatial bottleneck dimension of 8×8×128. The vertical line shows the training set range. some of the synthesized views outside of the train- ing range, it is visible that views can be predicted properlybased onSO(3) transformations. Figure 5 provides reconstruction error and vari- ance over an extended azimuth and elevation angle range of [0, 180◦]. The results in the figure are av- eragedoverall objects. The trainingdataset contains images with azimuth angles up to 37◦. A sharp rise in error and variance is observed at azimuth angle of approximately45◦. Foranglesabovethisvalue,error and variance increase rapidly. As such, the network cannotproperly reconstruct theseviews. These results show that our formulation for creat- ing equivariant feature spaces has the desired prop- erty to correlate spatial transformations with 2D views of the transformed object. Thus, the pro- posed Trilinear interpolation layer guides the net- work towards learning an equivariant feature space inSO(3). 5.Conclusion We extend recent work for learning equivari- ant feature spaces for synthesizing object views in SO(3). Theproposedextensionof theSpatialTrans- form Network [6], that we call the Trilinear interpo- lation Layer, appliesSO(3) transformations to fea- ture maps from 2D data. Validity of the approach is provided by training a simple encoder-decoder net- work architecture. Our experiments show that our formulation not only enables the prediction of views unseen during training time but also in a small range outside. Thecurrent formulationenablescontrol for5DoF, SO(3)and translations in image space. Future work will tackle adapting the proposed layer to create ob- ject view synthesis in all ofSE(3). We then plan to integratethis inaposerefinementstrategytoimprove objectposeestimation. 112

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik