Seite - 145 - in Joint Austrian Computer Vision and Robotics Workshop 2020

Bild der Seite - 145 -

Text der Seite - 145 -

Real-WorldVideoRestorationusingNoise2Noise MartinZach,ErichKobler InstituteofComputerGraphicsandVision {martin.zach@student, erich.kobler@icg}.tugraz.at Abstract.Restorationof real-worldanalogvideo is a challenging task due to the presence of very het- erogeneousdefects. Thesedefectsarehard tomodel, such that creating trainingdata synthetically is infea- sibleand instead time-consumingmanual editing is required. In thisworkweexplorewhether reasonable restorationmodels canbe learned fromdatawithout explicitlymodeling thedefectsormanual editing.We adoptNoise2Noise techniques, which eliminate the need forground truth targetsby replacing themwith corrupted instances. Tocompensate for temporalmis- matchesbetween the framesandensuremeaningful training, we apply motion correction. Our experi- ments show that video restoration can be learned using only corrupted frames, with performance ex- ceeding thatof conventional learning. 1. Introduction Recently the approach to signal reconstruction from corrupted measurements shifted from explic- itly modeling the statistics of the corruptions and image priors, e.g. Block-matching and 3D filtering (BM3D) [6] or Total Variation (TV) based meth- ods [4, 24], to learning based techniques such as ConvolutionalNeuralNetworks (CNNs) [11]. Since then, deep learning techniques [9, 18] have become very popular. Residual learning [9], batch normal- ization [10] and similar improvements along with increasing computational power and high quality datasetsmade itpossible to trainsucharchitecturesef- ficiently. Deep architectures are now the state-of-the- art formanyimagerestorationtaskssuchasdenoising, deblurring, and inpainting [8, 13, 19] as well as se- mantic segmentation [16,23] andclassification [27]. Despite these advances, generalization perfor- mance of such models is still largely limited by the sizeof theavailable dataset. The acquisition ofclean targets is often very tedious or difficult and it has Figure 1. Sample from the dataset, corrupted by typical temporally incoherent andvery local defects highlighted inorange. been proposed that data collection is becoming the critical bottleneck in machine learning [22]. It is therefor interesting to investigate whether networks can learn meaningful mappings when only being pre- sentedcorruptedsamples—bothas input andas tar- get. Lethinenetal. [15] showed that clean targets are not required to learn meaningful reconstructions, pro- vided that the corrupted samples are drawn from an arbitrarydistributionconditionedon theclean target which needs to be the expected value. This technique now known as Noise2Noise (N2N) has been success- fullyapplied to image restoration tasks [14]. In this work we explore the applicability of N2N for video denoising, especially concerning the real- world case of having finite data. Due to the nature of the defects, acquiring ground truth samples would require manual editing of the frames and isoftennot feasible. Further, the defects are very complex and divers in nature such that modeling them is difficult to impossible. Figure 1 displays such an example, where temporally incoherentdefectswith small spa- tial extentandhigh inter-pixelcorrelationcanbeseen. The N2N setting imposes limitations that require special considerations. Since different frames show the scene at different points in time, they cannot di- rectly be used as training pairs. We overcome this by separating temporal motion compensation and spatial denoising, allowingcorruptedsamples tobeboth in- 145

zurück zum Buch Joint Austrian Computer Vision and Robotics Workshop 2020"

Joint Austrian Computer Vision and Robotics Workshop 2020

Titel: Joint Austrian Computer Vision and Robotics Workshop 2020
Herausgeber: Graz University of Technology
Ort: Graz
Datum: 2020
Sprache: englisch
Lizenz: CC BY 4.0
ISBN: 978-3-85125-752-6
Abmessungen: 21.0 x 29.7 cm
Seiten: 188
Kategorien: Informatik; Technik