Page - 112 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 112 -
Text of the Page - 112 -
Figure 4: View synthesis fromSO(3) transformations unseen during training time. First row: reconstructed
Lamp with varying azimuth from -43â—¦ to 43â—¦. Second row: reconstructed Glue with elevation variation from -
43â—¦ to43â—¦. Rowthree tofive: objectsBenchvise,Camera,Cat reconstructedwithazimuth/elevationrangefrom
(-43â—¦,-43â—¦) to (43â—¦,43â—¦). Object poses outside the green box are samples out of training distribution. Centered
images, in the redbox,mark thecanonicalposes.
0 25 50 75 100 125 150 175
Angle in ∘
0∘02
0∘04
0∘06
0∘08
0∘10
0∘12
0∘14
0∘16 MAE
MSE
DSSIM
Figure 5: Error values and its variance over azimuth
angle. The network was trained on its corresponding
loss function with a spatial bottleneck dimension of
8×8×128. The vertical line shows the training set
range.
some of the synthesized views outside of the train-
ing range, it is visible that views can be predicted
properlybased onSO(3) transformations.
Figure 5 provides reconstruction error and vari-
ance over an extended azimuth and elevation angle
range of [0, 180â—¦]. The results in the figure are av-
eragedoverall objects. The trainingdataset contains
images with azimuth angles up to 37â—¦. A sharp rise
in error and variance is observed at azimuth angle of
approximately45â—¦. Foranglesabovethisvalue,error
and variance increase rapidly. As such, the network cannotproperly reconstruct theseviews.
These results show that our formulation for creat-
ing equivariant feature spaces has the desired prop-
erty to correlate spatial transformations with 2D
views of the transformed object. Thus, the pro-
posed Trilinear interpolation layer guides the net-
work towards learning an equivariant feature space
inSO(3).
5.Conclusion
We extend recent work for learning equivari-
ant feature spaces for synthesizing object views in
SO(3). Theproposedextensionof theSpatialTrans-
form Network [6], that we call the Trilinear interpo-
lation Layer, appliesSO(3) transformations to fea-
ture maps from 2D data. Validity of the approach is
provided by training a simple encoder-decoder net-
work architecture. Our experiments show that our
formulation not only enables the prediction of views
unseen during training time but also in a small range
outside.
Thecurrent formulationenablescontrol for5DoF,
SO(3)and translations in image space. Future work
will tackle adapting the proposed layer to create ob-
ject view synthesis in all ofSE(3). We then plan to
integratethis inaposerefinementstrategytoimprove
objectposeestimation.
112
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik