Page - 147 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 147 -

Text of the Page - 147 -

toobtain themotioncompensated frame xˆizj,where W :Rn3×Rn2→Rn3 is the bilinear warping opera- tor. Inaddition,wealsocomputethebackwardflowfijz and perform a forward-backward check to obtain a binary maskmizj∈{0,1}n in the reference framexij discarding occluded areas. To enable an effective detectionof the single-framedefectsusing temporal information, we require the flow estimation to inter- polateover thedefects such that theyareconsidered valid in themask. Combining the motion compensated frame and the maskwith the reference framexij yields the input to thedynamicmodelNθD :Rn3×Rn3×{0,1}n→Rn3. Itsoutput yˆizj=NθD(xij,xˆizj,mizj) (5) is the estimation of the clean true frame combining spatial and temporal information from two adjacent frames. Asbeforeθdenotes the trainable parameters of the DnCNN model learned from data by a SL or N2Napproach. 3.2.SupervisedandNoise2NoiseLearning Let us first consider supervised learning for recon- structing single-frame defects. Here one requires for every training sample framexij a corresponding target frame y¯ij,whichcanbecreatedby tediousand time-consuming manual editing. Given a collection of corrupted video scenes {ξi= (xi1, .. . ,xiNf)} Ns i=1 and a corresponding manually edited target scene {ψi= (y¯i1, .. . , y¯iNf)} Ns i=1 , we define the supervised trainingproblemas min θ Ns∑ i=1 LSL{S,D}(ξi,ψi,θ) . (6) The scene specific lossLSL{S,D} depends on the consid- eredmodel. For the staticmodelNθSweuse LSLS (ξi,ψi,θ) = Nf∑ j=1 ` ( NθS(xij)− y¯ij ) , (7) whereas the loss for thedynamicmodelNθD isgiven by LSLD (ξi,ψi,θ) = (8) Nf∑ j=1 Nf∑ z=1 z 6=j ` ( NθD(xij,xˆizj,mizj)− y¯ij ) , where `∈{‖·‖1,‖·‖22,‖·‖ } and‖x‖ = ∑ i |xi| is theHubernormusing |x| = { 1 2x 2 if |x|≤ (|x|− 12 ) else . (9) Despite the constant number of training sample frames, we can useNsNf(Nf−1)pairs for training thedynamicmodeldue to thepossiblepermutations, a factorof (Nf−1)more than for the staticmodel. To avoid the manual editing of target frames, we propose to adopt the N2N approach to remove single-frame defects. Thus, only the corrupted video scenes{ξi= (xi1, .. . ,xiNf)} Ns i=1 are used dur- ing training. Wemodify the trainingproblemforN2N toestimate the learnableparametersθof themodels to min θ Ns∑ i=1 LN2N{S,D}(ξi,θ), (10) using the specificscene loss for the staticmodel LN2NS (ξi,θ) = Nf∑ j=1 Nf∑ k=1 k 6=j ` ( mikj (NθS(xij)− xˆikj) ) (11) and for thedynamicmodel LN2ND (ξi,θ) = (12) Nf∑ j=1 Nf∑ z=1 z 6=j Nf∑ k=1 k 6=j k 6=j ` ( mikj (NθD(xij,xˆizj,mizj)− xˆikj) ) . This is illustrated in Figure 2. In contrast to super- vised learning,wechoosea framexik andcompensate for the motion to the reference framexij and get the warped frame xˆikj as well as the binary maskm i kj. Thenweonlyevaluate the loss function in theareas wheretheforward-backwardcheckisconsistent todis- regardmotionestimationerrors. Aparticular advan- tageofN2Nlearning is thata factorof(Nf−1)more trainingsamplesareavailable for thestaticmodeland (Nf−2) for thedynamicmodelwithout thenecessity tomanuallyedit any frame. In all our numerical experiments we optimize (6) and(10)usingadatasetofNs= 368videosequences ofNf = 3 frames, which was divided into training (343) and test set (25). For each of the 368 samples there is1manually edited target at j= 2, where only thesingle-framedefectsndwereremovedandthefilm 147

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik