Page - 111 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 111 -

Text of the Page - 111 -

metric latent loss space l1 l2 DSSIM DSSIM+ l1 (δ=0.85) l1 2x2x512 0.03±4.9e-04 0.03±4.2e-04 0.028±4.0e-04 0.031±4.3e-04 l2 0.096±2.1e-05 0.093±1.6e-02 0.093±1.8e-03 0.1±1.2e-03 DSSIM 0.102±3.5e-03 0.105±2.6e-03 0.09±3.0e-03 0.1±3.4e-03 l1 4x4x256 0.018±3.5e-04 0.02±3.6e-04 0.0243±4.0e-04 0.018±3.2e-04 l2 0.065±1.8e-05 0.0064±1.7e-05 0.086±2.5e-06 0.063±1.7e-05 DSSIM 0.061±3.0e-05 0.067±3.6e-05 0.07±2.8e-05 0.059±3.2e-05 l1 8x8x128 0.016±2.6e-04 0.017±2.6e-04 0.017±25e-05 0.017±2.4e-04 l2 0.06±1.6e-03 0.057±1.7e-03 0.06±1.7e-03 0.066±1.6e-03 DSSIM 0.055±3.0e-03 0.06±3.0e-03 0.053±2.9e-03 0.055±2.6e-03 Table 1: Performance study for latent spatial dimension and loss function. We present the error and variance, averaged overall objects, using l1, l2andDSSIMrespectively. Based on the newly defined canonical pose, images are rendered in a range of -43◦ to +43◦ azimuth and elevation. This is similar to [19] but with approxi- mately three times the range inazimuthangle. For training,onlyviews inarangeof -37◦ to+37◦ azimuthandelevationareused. Ofthese950images, 43 images are exclusively used for testing. The se- lectedsamplesaredistributeduniformly in theview- ing cone. An additional 59 images are included in the test set. These are in an angle range of negative andpositive37◦ to43◦ azimuthandelevation. Thus, views in a range that are not shown to the network during training. 4.3.TrainingProtocol For training we use the Adam optimizer with the learning rate set to 10−3. A batch size of 1 is used. We train 40 epochs per object for quantitative ab- lation studies. After 30 epochs, the learning rate is decreased by one magnitude. Qualitative evalua- tion is presented after 40 epochs of training. Dur- ing training, Gaussian blur with uniformly sampled σ=[0.0,1.5] isusedasonlineaugmentation. 4.4.HyperparameterStudies We study the choice of loss function used for op- timizationand theoptimal sizeof thebottleneck fea- turemaps. Table1presentsresultsaveragingoverthe test sets of all five objects. Presented are the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Structural Similarity Index (SSIM) as well as their correspondingvariances. The loss functions compared are l1, l2, Structural Disimilarity (DSSIM) and a combination of l1 and DSSIM as used by [19], where δ is the weighting parameter. The bottleneck tensor size is adjusted by Tensor size 2x2x512 4x4x256 8x8x128 parameters 13,330,508 3,753,804 1,881,932 Table 2: Network parameters per bottleneck tensor size. truncatingResNet18. Foradimensionof4×4×256 we use the outputs of the fourth and upsample using three stacks of transposed convolutions plus upsam- pling layers. For 2× 2× 512we use four stacks startingwitha5×5 transposedconvolution. Quantitativeevaluationshowsthat themetricused for evaluating the reconstruction quality correlates with the loss function used, which is to be expected. Using l2 is reasonable. However, when synthesiz- ing views for a specific application more carefully choosing the loss function will be obligatory. Sur- prisingly, a bottleneck tensor size of 8× 8× 128 leads to image synthesis with the lowest error even though this network has far fewer parameters than theotherspatialdimensions(seeTable2). This leads to the conclusion that bigger spatial dimensions are more important for synthesizing views than network depth. Based on the chosen hyperparameters we fur- therpresent experiments for synthesizingviews. 4.5.StudiesonViewSynthesis Studies are presented to illustrate that the pro- posedformulationgeneratesfeaturespacessuitedfor view synthesis inSO(3). Figure 4 shows views syn- thesizedfromunseentransformationsduringtraining time. Additionally, we present view predictions out- side of the training range. Views inside the train- ing range are reconstructed with sufficient quality to visually verify the expected object orientations. Despite the reconstruction quality being poor for 111

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"