Page - 155 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Image of the Page - 155 -

Text of the Page - 155 -

a material with a roughness of 0.2 already shows highly specular behavior. The multi-line scan camera setup as described in Sec. I was recreated in Blender, where a plane with a random color texture and a bump map (see Fig. 3) was moved underneath the camera. During each animation step the plane was moved by exactly one pixel. The resulting images were concatenated and reshaped in order to create a 3D image stack representation of the light field. Each image plane is then shifted to the left in the following manner: ∀x,y,i : I′i(x,y)= Ii(x−40i,y) (5) where i∈ [0...12] denotes the index of the image in the 3D light field structure, Ii∈{width×height×3} is the spatial image domain of the i’th view and I′i denotes the new translated image. Since the disparity (i.e. the gap between active lines on the camera sensor) is 40 pixels it was used as the shifting constant. The resulting overlap (at most 12×40 in the last view) is then cropped. This is done so that the EPI-lines are vertical with no slope, as they would be with an object with true 3D geometries. B. Network Parameters Evaluation For the optimal performance of a neural network some parameter evaluation and tuning, such as changing the num- ber of hidden neurons, using different activation functions or cost functions, is needed. In our evaluation, we looked at 3 different activation functions, namely linear, Sigmoid and rectified linear unit (RELU). The input layer, which consists of 39 neurons is fully connected with the hidden layer. We tried different numbers of neurons in the hidden layer for each evaluated activation function respectively. The results of these experiments can be seen in Table I. For the read-out of the output layer, which consists of one neuron since we only regress the gradient in the transport direction, a linear activation function was used. Given the problem we want to solve and our material properties, one can expect that a low number of hidden neurons will suffice and already give a good performance, as the Lambertian reflectance function is low dimensional. The low dimensionality of a Lambertian reflectance has been proven and explored in [20]. Having a less complex network architecture can be beneficial for both the runtime as well as the generalization of the network. Here 1, 3, 10 and 20 neurons of one hidden layer were used. The cost function used to measure the quality of the regressed prediction (therefore also the value used for the optimization of the network) was the mean square error (MSE): MSE= 1 n n ∑ i=1 (Yˆi−Yi)2 (6) For optimization the batch based gradient descent algorithm with a learning rate ofη=0.001 was used. The dataset was split into 80% training set and 20% testing set, as proposed by the Pareto Principle by J.M. Juran [21]. The network was trained for 100 epochs. Table I shows that theSigmoidhasboth thebestoverall, as well as the best performance in a single run with 20 neurons in the hidden layer. As the aforementioned experiments were performed only to show the overall tendency and convergence of the network structure, a small learning rate η was used for all the experiments. However, [22] shows that exploring this parameter further is important for the overall network accuracy. For this task we found that a learning rate of η= 0.2 works best which improved the overall accuracy of the network toMSEtrain=0.020464 and MSEtest=0.02052 when trained for 100 epochs. TABLE I: Training and testing MSE with different numbers of neurons and activation functions. Training set MSE # hidden neurons 1 3 10 20 avg act. fct. linear 0.05903 0.05988 0.05760 0.05857 0.05877 Sigmoid 0.05429 0.05285 0.05263 0.04792 0.05192 RELU 0.05605 0.05543 0.05283 0.05150 0.05395 Testing set MSE linear 0.05902 0.05972 0.05777 0.05855 0.05877 Sigmoid 0.05402 0.05276 0.05266 0.04768 0.05178 RELU 0.05608 0.05571 0.05312 0.05147 0.054095 C. Network Performance Fig. 5: Evolution of the mean square error over epochs with a learning rate ofη=0.2. Left: Training set (80% randomly chosen from all sets), right: Testing set (20% randomly chosen from all sets). Fig. 5 shows the convergence of the overall accuracy on the training and test set, combining and shuffling all six created datasets. This was done in order to generalize the network as much as possible regarding the material type (matte, semi-glossyorglossy).Once thenetworkwas learned it was applied to each material type individually and the accuracy of the prediction on the whole set was reported. For simplicity we use acronyms for each created dataset, as shown in Fig. 3. For the sake of simplicity we took the libertyof reporting theerroron thewholedataset (datapoints used for training and testing combined). As the errors on the training and on the testing set are very close together and there is no sign of overfitting the network, this liberty can be taken without distorting the results. The best performance was achieved on the semi-glossy datasets. The larger error on the glossy dataset is due to the fact that the sign of the surface normal is sometimes predicted wrong if the specular lobe is narrow and outside of the observed range. This can 155

back to the book Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics"

Proceedings of the OAGM&ARW Joint Workshop Vision, Automation and Robotics

Title: Proceedings of the OAGM&ARW Joint Workshop
Subtitle: Vision, Automation and Robotics
Authors: Peter M. Roth; Markus Vincze; Wilfried Kubinger; Andreas Müller; Bernhard Blaschitz; Svorad Stolc
Publisher: Verlag der Technischen Universität Graz
Location: Wien
Date: 2017
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-524-9
Size: 21.0 x 29.7 cm
Pages: 188
Keywords: Tagungsband
Categories: International; Tagungsbände

Page - 155 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics

Image of the Page - 155 -

Text of the Page - 155 -

Table of contents