Page - 146 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Image of the Page - 146 -

Text of the Page - 146 -

put and target for the model. With this architecture we wereable to achieve satisfactory results, showing that video restoration can be done entirely without ground truth data. This significantly eases the task by avoiding the requirement for tedious manual labeling. 2.RelatedWork Learning-based Image Restoration Convolu- tional Neural Networks (CNNs) were first used in 2008 [11],where theyachievedsimilarperformance to model based approaches. Later, Burger et al. [2] showed that shallow plain Multi Layer Perceptrons (MLP) can achieve results comparable to BM3D. The DnCNN [30] combined recent advances such as the convolutional structure, global residual learning [27], batchnormalization [10], andaReLU activation [20] to achieve a significant performance increase over state-of-the-art explicit models. Later, the FFDNet [31] extended the DnCNN by the use of input noise maps to account for spatially varying noise intensity, in order to apply it to real-world photographs. CBDNet [8] builds on this idea and introduces a noise estimation subnetwork whose output is fed into the denoising network along with the image to achieve notably good results for real-worlddenoising. VideoRestoration Compared to imagedenoising, little work exists on video denoising. Patch-based approaches are still the most prominent, e.g. V- BM4D [17] and Video Non-Local Bayes (VNLB) [1]. The Deep Video Denoising Network (DVDNet) [28] wasoneof thefirst convolutionalnetworkapproaches to outperform VNLB, whilst being computationally more efficient. In the DVDNet, two separate net- works are used for spatial and temporal denoising, and adjacent frames are motion compensated using DeepFlow [29]. Similarly, ViDeNN [5] uses sepa- ratedspatialandtemporaldenoisingnetworks,butmo- tion compensation is learned in the temporal network. Frame-to-frame Training [7] exploits N2N by fine- tuning a pretrained network onmotion-compensated successive frames. However, the applicability to real- world data remains limited since only one frame is considered for restoration. Besidesdenoising, learn- ing based methods have been successfully applied to frame interpolation [21], super resolution [3] and deblurring [25]. 3.Methods We consider video scenesξi= (xij) Nf j=1 consisting ofNf framesxij∈Rn3with a resolutionn=n1× n2 and RGB channels. Each frame of a scenexij is assumed tobecorruptedbyadditivenoise, i.e. xij=y i j+ng+nd , (1) whereyij is theunderlyingclean true frame,ngmod- elsnoisedue tofilmgrainandnd represents the spa- tially correlated single-frame defects highlighted in Figure1. Bothnoise sourcesareuncorrelatedacross the temporal dimension due to the stochastic nature offilmgrainng and the temporal incoherenceofnd. Wenote that the approach isnot limited to thisnoise model. 3.1.Models forSingle-FrameDefectRestoration The simplest approach to estimate the clean true frameyij isbymeansof single-framedenoising. For this setting we use the DnCNN [30] to generate a prediction yˆij by yˆji =NθS(xij) , (2) solely based on the single corresponding corrupted framexij. Here,θ are theparametersof theDnCNN. Theyare learned fromdataeitherbysupervised learn- ing (SL)—provided that target framesareavailable —orby theN2Napproach,whichwedescribe later in this section. Themajor disadvantage of the single- frame denoising approach is that the model cannot exploit temporal information to detect and restore the single-framedefects. To overcome this issue and enable the extraction of temporal features, we propose to learn a variant of the DnCNN model operating on two consecutive frames. These twoadjacent framesneed tobealigned tocompensate themotion indynamicscenesandease thedenoisingproblem. Indetail,weaccount for the motionbycomputing theopticalflow fizj=F(xiz,xij) (3) from framexiz tox i j, whereF :Rn3×Rn3→Rn2 implements thepretrainedPWC-Net [26]. Using the therebyestimatedflowfizj,wewarpaframex i z of the sceneonto the reference framexij by xˆizj=W(xiz,fizj) (4) 146

back to the book Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Title: Joint Austrian Computer Vision and Robotics Workshop 2020
Editor: Graz University of Technology
Location: Graz
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-85125-752-6
Size: 21.0 x 29.7 cm
Pages: 188
Categories: Informatik; Technik