Seite - 145 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Bild der Seite - 145 -
Text der Seite - 145 -
Real-WorldVideoRestorationusingNoise2Noise
MartinZach,ErichKobler
InstituteofComputerGraphicsandVision
{martin.zach@student, erich.kobler@icg}.tugraz.at
Abstract.Restorationof real-worldanalogvideo is
a challenging task due to the presence of very het-
erogeneousdefects. Thesedefectsarehard tomodel,
such that creating trainingdata synthetically is infea-
sibleand instead time-consumingmanual editing is
required. In thisworkweexplorewhether reasonable
restorationmodels canbe learned fromdatawithout
explicitlymodeling thedefectsormanual editing.We
adoptNoise2Noise techniques, which eliminate the
need forground truth targetsby replacing themwith
corrupted instances. Tocompensate for temporalmis-
matchesbetween the framesandensuremeaningful
training, we apply motion correction. Our experi-
ments show that video restoration can be learned
using only corrupted frames, with performance ex-
ceeding thatof conventional learning.
1. Introduction
Recently the approach to signal reconstruction
from corrupted measurements shifted from explic-
itly modeling the statistics of the corruptions and
image priors, e.g. Block-matching and 3D filtering
(BM3D) [6] or Total Variation (TV) based meth-
ods [4, 24], to learning based techniques such as
ConvolutionalNeuralNetworks (CNNs) [11]. Since
then, deep learning techniques [9, 18] have become
very popular. Residual learning [9], batch normal-
ization [10] and similar improvements along with
increasing computational power and high quality
datasetsmade itpossible to trainsucharchitecturesef-
ficiently. Deep architectures are now the state-of-the-
art formanyimagerestorationtaskssuchasdenoising,
deblurring, and inpainting [8, 13, 19] as well as se-
mantic segmentation [16,23] andclassification [27].
Despite these advances, generalization perfor-
mance of such models is still largely limited by the
sizeof theavailable dataset. The acquisition ofclean
targets is often very tedious or difficult and it has Figure 1. Sample from the dataset, corrupted by typical
temporally incoherent andvery local defects highlighted
inorange.
been proposed that data collection is becoming the
critical bottleneck in machine learning [22]. It is
therefor interesting to investigate whether networks
can learn meaningful mappings when only being pre-
sentedcorruptedsamples—bothas input andas tar-
get. Lethinenetal. [15] showed that clean targets are
not required to learn meaningful reconstructions, pro-
vided that the corrupted samples are drawn from an
arbitrarydistributionconditionedon theclean target
which needs to be the expected value. This technique
now known as Noise2Noise (N2N) has been success-
fullyapplied to image restoration tasks [14].
In this work we explore the applicability of N2N
for video denoising, especially concerning the real-
world case of having finite data. Due to the nature
of the defects, acquiring ground truth samples would
require manual editing of the frames and isoftennot
feasible. Further, the defects are very complex and
divers in nature such that modeling them is difficult
to impossible. Figure 1 displays such an example,
where temporally incoherentdefectswith small spa-
tial extentandhigh inter-pixelcorrelationcanbeseen.
The N2N setting imposes limitations that require
special considerations. Since different frames show
the scene at different points in time, they cannot di-
rectly be used as training pairs. We overcome this by
separating temporal motion compensation and spatial
denoising, allowingcorruptedsamples tobeboth in-
145
Joint Austrian Computer Vision and Robotics Workshop 2020
- Titel
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Herausgeber
- Graz University of Technology
- Ort
- Graz
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Abmessungen
- 21.0 x 29.7 cm
- Seiten
- 188
- Kategorien
- Informatik
- Technik