Seite - 111 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 111 -
Text der Seite - 111 -
metric latent loss
space l1 l2 DSSIM DSSIM+ l1 (δ=0.85)
l1 2x2x512 0.03±4.9e-04 0.03±4.2e-04 0.028±4.0e-04 0.031±4.3e-04
l2 0.096±2.1e-05 0.093±1.6e-02 0.093±1.8e-03 0.1±1.2e-03
DSSIM 0.102±3.5e-03 0.105±2.6e-03 0.09±3.0e-03 0.1±3.4e-03
l1 4x4x256 0.018±3.5e-04 0.02±3.6e-04 0.0243±4.0e-04 0.018±3.2e-04
l2 0.065±1.8e-05 0.0064±1.7e-05 0.086±2.5e-06 0.063±1.7e-05
DSSIM 0.061±3.0e-05 0.067±3.6e-05 0.07±2.8e-05 0.059±3.2e-05
l1 8x8x128 0.016±2.6e-04 0.017±2.6e-04 0.017±25e-05 0.017±2.4e-04
l2 0.06±1.6e-03 0.057±1.7e-03 0.06±1.7e-03 0.066±1.6e-03
DSSIM 0.055±3.0e-03 0.06±3.0e-03 0.053±2.9e-03 0.055±2.6e-03
Table 1: Performance study for latent spatial dimension and loss function. We present the error and variance,
averaged overall objects, using l1, l2andDSSIMrespectively.
Based on the newly defined canonical pose, images
are rendered in a range of -43◦ to +43◦ azimuth and
elevation. This is similar to [19] but with approxi-
mately three times the range inazimuthangle.
For training,onlyviews inarangeof -37◦ to+37◦
azimuthandelevationareused. Ofthese950images,
43 images are exclusively used for testing. The se-
lectedsamplesaredistributeduniformly in theview-
ing cone. An additional 59 images are included in
the test set. These are in an angle range of negative
andpositive37◦ to43◦ azimuthandelevation. Thus,
views in a range that are not shown to the network
during training.
4.3.TrainingProtocol
For training we use the Adam optimizer with the
learning rate set to 10−3. A batch size of 1 is used.
We train 40 epochs per object for quantitative ab-
lation studies. After 30 epochs, the learning rate
is decreased by one magnitude. Qualitative evalua-
tion is presented after 40 epochs of training. Dur-
ing training, Gaussian blur with uniformly sampled
σ=[0.0,1.5] isusedasonlineaugmentation.
4.4.HyperparameterStudies
We study the choice of loss function used for op-
timizationand theoptimal sizeof thebottleneck fea-
turemaps. Table1presentsresultsaveragingoverthe
test sets of all five objects. Presented are the Mean
Absolute Error (MAE), Root Mean Squared Error
(RMSE) and Structural Similarity Index (SSIM) as
well as their correspondingvariances.
The loss functions compared are l1, l2, Structural
Disimilarity (DSSIM) and a combination of l1 and
DSSIM as used by [19], where δ is the weighting
parameter. The bottleneck tensor size is adjusted by Tensor size 2x2x512 4x4x256 8x8x128
parameters 13,330,508 3,753,804 1,881,932
Table 2: Network parameters per bottleneck tensor
size.
truncatingResNet18. Foradimensionof4×4×256
we use the outputs of the fourth and upsample using
three stacks of transposed convolutions plus upsam-
pling layers. For 2× 2× 512we use four stacks
startingwitha5×5 transposedconvolution.
Quantitativeevaluationshowsthat themetricused
for evaluating the reconstruction quality correlates
with the loss function used, which is to be expected.
Using l2 is reasonable. However, when synthesiz-
ing views for a specific application more carefully
choosing the loss function will be obligatory. Sur-
prisingly, a bottleneck tensor size of 8× 8× 128
leads to image synthesis with the lowest error even
though this network has far fewer parameters than
theotherspatialdimensions(seeTable2). This leads
to the conclusion that bigger spatial dimensions are
more important for synthesizing views than network
depth. Based on the chosen hyperparameters we fur-
therpresent experiments for synthesizingviews.
4.5.StudiesonViewSynthesis
Studies are presented to illustrate that the pro-
posedformulationgeneratesfeaturespacessuitedfor
view synthesis inSO(3). Figure 4 shows views syn-
thesizedfromunseentransformationsduringtraining
time. Additionally, we present view predictions out-
side of the training range. Views inside the train-
ing range are reconstructed with sufficient quality
to visually verify the expected object orientations.
Despite the reconstruction quality being poor for
111
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik