Page - 146 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 146 -
Text of the Page - 146 -
put and target for the model. With this architecture
we wereable to achieve satisfactory results, showing
that video restoration can be done entirely without
ground truth data. This significantly eases the task by
avoiding the requirement for tedious manual labeling.
2.RelatedWork
Learning-based Image Restoration Convolu-
tional Neural Networks (CNNs) were first used in
2008 [11],where theyachievedsimilarperformance
to model based approaches. Later, Burger et al. [2]
showed that shallow plain Multi Layer Perceptrons
(MLP) can achieve results comparable to BM3D.
The DnCNN [30] combined recent advances such
as the convolutional structure, global residual
learning [27], batchnormalization [10], andaReLU
activation [20] to achieve a significant performance
increase over state-of-the-art explicit models. Later,
the FFDNet [31] extended the DnCNN by the use
of input noise maps to account for spatially varying
noise intensity, in order to apply it to real-world
photographs. CBDNet [8] builds on this idea and
introduces a noise estimation subnetwork whose
output is fed into the denoising network along
with the image to achieve notably good results for
real-worlddenoising.
VideoRestoration Compared to imagedenoising,
little work exists on video denoising. Patch-based
approaches are still the most prominent, e.g. V-
BM4D [17] and Video Non-Local Bayes (VNLB) [1].
The Deep Video Denoising Network (DVDNet) [28]
wasoneof thefirst convolutionalnetworkapproaches
to outperform VNLB, whilst being computationally
more efficient. In the DVDNet, two separate net-
works are used for spatial and temporal denoising,
and adjacent frames are motion compensated using
DeepFlow [29]. Similarly, ViDeNN [5] uses sepa-
ratedspatialandtemporaldenoisingnetworks,butmo-
tion compensation is learned in the temporal network.
Frame-to-frame Training [7] exploits N2N by fine-
tuning a pretrained network onmotion-compensated
successive frames. However, the applicability to real-
world data remains limited since only one frame is
considered for restoration. Besidesdenoising, learn-
ing based methods have been successfully applied
to frame interpolation [21], super resolution [3] and
deblurring [25]. 3.Methods
We consider video scenesξi= (xij)
Nf
j=1 consisting
ofNf framesxij∈Rn3with a resolutionn=n1×
n2 and RGB channels. Each frame of a scenexij is
assumed tobecorruptedbyadditivenoise, i.e.
xij=y i
j+ng+nd , (1)
whereyij is theunderlyingclean true frame,ngmod-
elsnoisedue tofilmgrainandnd represents the spa-
tially correlated single-frame defects highlighted in
Figure1. Bothnoise sourcesareuncorrelatedacross
the temporal dimension due to the stochastic nature
offilmgrainng and the temporal incoherenceofnd.
Wenote that the approach isnot limited to thisnoise
model.
3.1.Models forSingle-FrameDefectRestoration
The simplest approach to estimate the clean true
frameyij isbymeansof single-framedenoising. For
this setting we use the DnCNN [30] to generate a
prediction yˆij by
yˆji =NθS(xij) , (2)
solely based on the single corresponding corrupted
framexij. Here,θ are theparametersof theDnCNN.
Theyare learned fromdataeitherbysupervised learn-
ing (SL)—provided that target framesareavailable
—orby theN2Napproach,whichwedescribe later
in this section. Themajor disadvantage of the single-
frame denoising approach is that the model cannot
exploit temporal information to detect and restore the
single-framedefects.
To overcome this issue and enable the extraction
of temporal features, we propose to learn a variant
of the DnCNN model operating on two consecutive
frames. These twoadjacent framesneed tobealigned
tocompensate themotion indynamicscenesandease
thedenoisingproblem. Indetail,weaccount for the
motionbycomputing theopticalflow
fizj=F(xiz,xij) (3)
from framexiz tox i
j, whereF :Rn3×Rn3→Rn2
implements thepretrainedPWC-Net [26]. Using the
therebyestimatedflowfizj,wewarpaframex i
z of the
sceneonto the reference framexij by
xˆizj=W(xiz,fizj) (4)
146
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik