Page - 147 - in Joint Austrian Computer Vision and Robotics Workshop 2020
Image of the Page - 147 -
Text of the Page - 147 -
toobtain themotioncompensated frame xˆizj,where
W :Rn3×Rn2→Rn3 is the bilinear warping opera-
tor.
Inaddition,wealsocomputethebackwardflowfijz
and perform a forward-backward check to obtain a
binary maskmizj∈{0,1}n in the reference framexij
discarding occluded areas. To enable an effective
detectionof the single-framedefectsusing temporal
information, we require the flow estimation to inter-
polateover thedefects such that theyareconsidered
valid in themask.
Combining the motion compensated frame and the
maskwith the reference framexij yields the input to
thedynamicmodelNθD :Rn3×Rn3×{0,1}n→Rn3.
Itsoutput
yˆizj=NθD(xij,xˆizj,mizj) (5)
is the estimation of the clean true frame combining
spatial and temporal information from two adjacent
frames. Asbeforeθdenotes the trainable parameters
of the DnCNN model learned from data by a SL or
N2Napproach.
3.2.SupervisedandNoise2NoiseLearning
Let us first consider supervised learning for recon-
structing single-frame defects. Here one requires
for every training sample framexij a corresponding
target frame y¯ij,whichcanbecreatedby tediousand
time-consuming manual editing. Given a collection
of corrupted video scenes {ξi= (xi1, .. . ,xiNf)} Ns
i=1
and a corresponding manually edited target
scene {ψi= (y¯i1, .. . , y¯iNf)}
Ns
i=1 , we define the
supervised trainingproblemas
min
θ Ns∑
i=1 LSL{S,D}(ξi,ψi,θ) . (6)
The scene specific lossLSL{S,D} depends on the consid-
eredmodel. For the staticmodelNθSweuse
LSLS (ξi,ψi,θ) = Nf∑
j=1 ` (
NθS(xij)− y¯ij )
, (7)
whereas the loss for thedynamicmodelNθD isgiven
by
LSLD (ξi,ψi,θ) = (8)
Nf∑
j=1 Nf∑
z=1
z 6=j ` (
NθD(xij,xˆizj,mizj)− y¯ij )
, where `∈{‖·‖1,‖·‖22,‖·‖ } and‖x‖ = ∑
i |xi| is
theHubernormusing
|x| = {
1
2x 2 if |x|≤
(|x|− 12 ) else . (9)
Despite the constant number of training sample
frames, we can useNsNf(Nf−1)pairs for training
thedynamicmodeldue to thepossiblepermutations,
a factorof (Nf−1)more than for the staticmodel.
To avoid the manual editing of target frames,
we propose to adopt the N2N approach to remove
single-frame defects. Thus, only the corrupted
video scenes{ξi= (xi1, .. . ,xiNf)}
Ns
i=1 are used dur-
ing training. Wemodify the trainingproblemforN2N
toestimate the learnableparametersθof themodels
to
min
θ Ns∑
i=1 LN2N{S,D}(ξi,θ), (10)
using the specificscene loss for the staticmodel
LN2NS (ξi,θ) = Nf∑
j=1 Nf∑
k=1
k 6=j ` (
mikj (NθS(xij)− xˆikj) )
(11)
and for thedynamicmodel
LN2ND (ξi,θ) = (12)
Nf∑
j=1 Nf∑
z=1
z 6=j Nf∑
k=1
k 6=j
k 6=j ` (
mikj (NθD(xij,xˆizj,mizj)− xˆikj)
)
.
This is illustrated in Figure 2. In contrast to super-
vised learning,wechoosea framexik andcompensate
for the motion to the reference framexij and get the
warped frame xˆikj as well as the binary maskm
i
kj.
Thenweonlyevaluate the loss function in theareas
wheretheforward-backwardcheckisconsistent todis-
regardmotionestimationerrors. Aparticular advan-
tageofN2Nlearning is thata factorof(Nf−1)more
trainingsamplesareavailable for thestaticmodeland
(Nf−2) for thedynamicmodelwithout thenecessity
tomanuallyedit any frame.
In all our numerical experiments we optimize (6)
and(10)usingadatasetofNs= 368videosequences
ofNf = 3 frames, which was divided into training
(343) and test set (25). For each of the 368 samples
there is1manually edited target at j= 2, where only
thesingle-framedefectsndwereremovedandthefilm
147
Joint Austrian Computer Vision and Robotics Workshop 2020
- Title
- Joint Austrian Computer Vision and Robotics Workshop 2020
- Editor
- Graz University of Technology
- Location
- Graz
- Date
- 2020
- Language
- English
- License
- CC BY 4.0
- ISBN
- 978-3-85125-752-6
- Size
- 21.0 x 29.7 cm
- Pages
- 188
- Categories
- Informatik
- Technik