Seite - 148 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 148 -
Text der Seite - 148 -
trainingdata
ξi=(x i
1, . . . ,x i
Nf ) motion
correction
motion
correction
xiz
xik {xˆizj,mizj}
{xˆikj,mikj} NθD
`
yˆizj
xij xij
xij xij
Figure2. Illustrationof theproposedsamplingprocess forN2Nlearning tovideo restorationusingmotioncompensation.
Herewechoosexij as the reference frame,andwarpx i
z andx i
k onto it. Then,wecalculate theestimate yˆ i
zj byusing the
reference framexij and xˆ i
zj, andfinally the lossusing xˆ i
kj.
Error static dynamic
SL N2N SL N2N
`2 0.002151 0.018161 0.000675 0.001648
`1 0.002736 0.012005 0.000320 0.001910
= 0.1 — — 0.000721 0.001630
Table1.Evaluationof theaveragemeansquarederror to
themanuallyedited target images of the test set.
grainwasnotchanged. Weusedapre-trainedPWC-
Net [26] formotioncompensationandextended the
DnCNN [30] to 20 layers with batch normalization,
and64convolution kernels of size3×3. Using the
ADAM [12] optimizer on a batch size of 128, we
trained the models for3000 iterations with a learning
rate ofα= 1×10−4 and decay rates ofβ1 = 0.9
andβ2= 0.999. We sampled patches of size64×64
from the frames and augmented the data by vertical
and horizontalflipping. Finally,weestimate yˆi2 as
yˆi2= 




yˆi12 ifm i
12∧(¬mi32)
yˆi32 ifm i
32∧(¬mi12)
yˆi12+yˆ i
32
2 else . (13)
4.Results
In this section we present results to highlight the
benefitsofN2Nlearning for removingsingle-frame
defects in scanned historical video scenes. We per-
form quantitative and qualitative evaluation for the
static and dynamic models and compare supervised
learning to N2N. The qualitative results were also
evaluated ina reader studywitha focuson temporal
coherence.
We show the Mean Squared Error (MSE) on the
test set inTable1andsomerepresentativeexamples
in Figure 3. Given the nature of the defects, their
detection is easier if the model can use temporal in-
formation. This is confirmed by the results in Table 1, Original SL N2N
OverallBest 3.13% 43.23% 53.65%
LeastFlickering 0.52% 10.94% 88.54%
SignificantSmoothing 0% 1.04% 56.77%
Table 2. Quantitative evaluation of the reader study. The
results of indicate that the majority of participants prefers
the N2N method, where artifacts are significantly better
removedat thecostof introducingsomesmoothing.
since the results show that the dynamic model outper-
forms the staticmodel.
Thenumerical results indicatebetter performance
for the models trained on SL targets. However, this is
misleadingsince itdoesnotnecessarilycorrespond to
betterdefect removal. In fact,Figure3suggests that
N2N learning improves defect removal. The superior
MSE of supervised models is explained by the preser-
vation of film grain, which has not been removed
in the targets. In contrast, since film grain differs
between the frames, N2N models learn to remove
it. Thus, even though they arequalitativelybetter at
removingdefects, theyyieldworsenumerical errors.
Further, visual quality of videos cannot be de-
termined by considering the individual frames only.
The temporal context needs to be considered as well,
where incoherenciescan lead toanunpleasantview-
ingexperience. Qualitymeasurescouldbe improved
by taking temporal coherency intoaccount, however
objective evaluation would still be problematic. Thus,
numerical error measures are not suited to fully deter-
mine thevisualqualityof theoutput.
Ingeneral, evaluation isbestdonebyahumanwho
can subjectively decide whether,e.g., removal of film
grain is desired, and how pleasant the final video is to
watch over all. We therefor conducted a reader study1
1Material availableathttps://github.com/zacmar/restoration-
reader-study
148
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik