Seite - 155 - in Proceedings of the OAGM&ARW Joint Workshop - Vision, Automation and Robotics
Bild der Seite - 155 -
Text der Seite - 155 -
a material with a roughness of 0.2 already shows highly
specular behavior.
The multi-line scan camera setup as described in Sec. I
was recreated in Blender, where a plane with a random
color texture and a bump map (see Fig. 3) was moved
underneath the camera. During each animation step the plane
was moved by exactly one pixel. The resulting images were
concatenated and reshaped in order to create a 3D image
stack representation of the light field. Each image plane is
then shifted to the left in the following manner:
∀x,y,i : I′i(x,y)= Ii(x−40i,y) (5)
where i∈ [0...12] denotes the index of the image in the
3D light field structure, Ii∈{width×height×3} is the
spatial image domain of the i’th view and I′i denotes the new
translated image. Since the disparity (i.e. the gap between
active lines on the camera sensor) is 40 pixels it was used as
the shifting constant. The resulting overlap (at most 12×40
in the last view) is then cropped. This is done so that the
EPI-lines are vertical with no slope, as they would be with
an object with true 3D geometries.
B. Network Parameters Evaluation
For the optimal performance of a neural network some
parameter evaluation and tuning, such as changing the num-
ber of hidden neurons, using different activation functions
or cost functions, is needed. In our evaluation, we looked at
3 different activation functions, namely linear, Sigmoid and
rectified linear unit (RELU). The input layer, which consists
of 39 neurons is fully connected with the hidden layer. We
tried different numbers of neurons in the hidden layer for
each evaluated activation function respectively. The results
of these experiments can be seen in Table I. For the read-out
of the output layer, which consists of one neuron since we
only regress the gradient in the transport direction, a linear
activation function was used.
Given the problem we want to solve and our material
properties, one can expect that a low number of hidden
neurons will suffice and already give a good performance,
as the Lambertian reflectance function is low dimensional.
The low dimensionality of a Lambertian reflectance has been
proven and explored in [20]. Having a less complex network
architecture can be beneficial for both the runtime as well
as the generalization of the network. Here 1, 3, 10 and 20
neurons of one hidden layer were used.
The cost function used to measure the quality of the
regressed prediction (therefore also the value used for the
optimization of the network) was the mean square error
(MSE):
MSE= 1
n n
∑
i=1
(Yˆi−Yi)2 (6)
For optimization the batch based gradient descent algorithm
with a learning rate ofη=0.001 was used. The dataset was
split into 80% training set and 20% testing set, as proposed
by the Pareto Principle by J.M. Juran [21]. The network was
trained for 100 epochs. Table I shows that theSigmoidhasboth thebestoverall, as
well as the best performance in a single run with 20 neurons
in the hidden layer. As the aforementioned experiments
were performed only to show the overall tendency and
convergence of the network structure, a small learning rate
η was used for all the experiments. However, [22] shows
that exploring this parameter further is important for the
overall network accuracy. For this task we found that a
learning rate of η= 0.2 works best which improved the
overall accuracy of the network toMSEtrain=0.020464 and
MSEtest=0.02052 when trained for 100 epochs.
TABLE I: Training and testing MSE with different numbers
of neurons and activation functions.
Training set MSE
# hidden
neurons 1 3 10 20 avg
act. fct.
linear 0.05903 0.05988 0.05760 0.05857 0.05877
Sigmoid 0.05429 0.05285 0.05263 0.04792 0.05192
RELU 0.05605 0.05543 0.05283 0.05150 0.05395
Testing set MSE
linear 0.05902 0.05972 0.05777 0.05855 0.05877
Sigmoid 0.05402 0.05276 0.05266 0.04768 0.05178
RELU 0.05608 0.05571 0.05312 0.05147 0.054095
C. Network Performance
Fig. 5: Evolution of the mean square error over epochs with
a learning rate ofη=0.2. Left: Training set (80% randomly
chosen from all sets), right: Testing set (20% randomly
chosen from all sets).
Fig. 5 shows the convergence of the overall accuracy on
the training and test set, combining and shuffling all six
created datasets. This was done in order to generalize the
network as much as possible regarding the material type
(matte, semi-glossyorglossy).Once thenetworkwas learned
it was applied to each material type individually and the
accuracy of the prediction on the whole set was reported.
For simplicity we use acronyms for each created dataset,
as shown in Fig. 3. For the sake of simplicity we took the
libertyof reporting theerroron thewholedataset (datapoints
used for training and testing combined). As the errors on the
training and on the testing set are very close together and
there is no sign of overfitting the network, this liberty can
be taken without distorting the results. The best performance
was achieved on the semi-glossy datasets. The larger error
on the glossy dataset is due to the fact that the sign of the
surface normal is sometimes predicted wrong if the specular
lobe is narrow and outside of the observed range. This can
155
Proceedings of the OAGM&ARW Joint Workshop
Vision, Automation and Robotics
- Titel
- Proceedings of the OAGM&ARW Joint Workshop
- Untertitel
- Vision, Automation and Robotics
- Autoren
- Peter M. Roth
- Markus Vincze
- Wilfried Kubinger
- Andreas Müller
- Bernhard Blaschitz
- Svorad Stolc
- Verlag
- Verlag der Technischen Universität Graz
- Ort
- Wien
- Datum
- 2017
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-524-9
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Schlagwörter
- Tagungsband
- Kategorien
- International
- Tagungsbände