DS0705 - Fondements du numérique

Missing Audio Data Inpainting – MAD

Inpainting of missing audio data (MAD)

Audio inpainting : reconstructing missing parts in sounds.<br />The MAD Project (2014-2018) is dedicated to general problems where missing parts must be recovered in partially-observed audio data.

A general framework for various reconstruction applications using advanced models and techniques.

Audio inpainting is a recent generic framework for reconstructing missing parts in sounds or in their representations : restoration of old recordings, declipping, spectogram retouching and other modifications are thus achieved using signal processing and machine learning tools. Following a proof of concept in 2012, the MAD project aims at proposing fundamental research works on sound modeling and at developing new approaches for audio inpainting. In particular, it provides advanced techniques for settings in which holes in signals may be small or large, and it promotes and addresses time-frequency audio inpainting, where some time-frequency coefficients are missing. The MAD project also expands the recent audio inpainting concept by extending the range of its applications and by disseminating its results.

Exploiting partial observations to learn elementary components and reconstruct sounds.

Audio inpainting techniques are relying on the observed parts to reconstruct the missing ones. The main approaches developed in the project model the data by decomposing them in dictionaries made of elementary patterns (sparse decompositions; nonnegative matrix factorisation, NMF) and reconstruct the sounds by paying attention to their intrinsic properties such as the phases of the oscillatory components. Other sound structures were exploited to improve those models, such as the autosimilarity in music and speech signals (interchannel similarity, slow variations of the contents, non-local repetition of elementary patterns). In addition, the inpainting problems were also addressed jointly with source separation and compression problems. From a methodological and more abstract viewpoint beyond its applications, audio inpainting also offers an appropriate framework for testing and assessing the relevance of models and estimation algorithms : it shows to which extent the model estimated from a partial observation remains valid on the missing parts that are predicted.

Results

Within the MAD project, the recent audio inpainting concept has been developed along several axes : a scientific axis with a large diversity of research works (advances in audio declipping, multichannel and structured models, spectrogram inpainting, phase inpainting, caracterisation of complex-valued time-frequency matrices) ; the animation of collective research dynamics (new national and international collaborations, scientific meetings) ; the development of a set of Python packages for audio inpainting called skmad-suite.

Prospects

From our viewpoint, the main perspective of the project is the question of time-frequency inpainting, for which some works have been proposes (spectrogram inpainting, phase inpainting). The main challenge is to jointly model the amplitudes and phases in the time-frequency plane, and to develop approaches that combine signal processing, machine learning and optimization.

Scientific productions and patents

About twenty publications have been generated from the MAD project, half of them resulting from national and international collaborations. All the the publications and the available code and data are freely released on the open access website HAL and on the project's website mad.lis-lab.fr.

Submission summary

The audio inpainting concept, recently proposed by the coordinator and colleagues, is a conceptual breakthrough that unifies in a single framework all the audio processing problems where data is partially missing or highly degraded. Instances of such problems are click removal, CD scratches restoration, declipping, packet loss concealment, source reconstruction in the time-frequency domain and bandwidth extension. While these tasks had been addressed separately in the past, the audio inpainting unified formulation as an inverse problem is a promising abstraction to factorize the main difficulties shared among tasks, to provide methods that outperform state-of-the-art techniques on existing tasks and to address new problems where missing data reconstruction has been too difficult a task so far. The MAD proposal develops audio inpainting for any task involving missing audio data.

The main objectives of this proposal are: a) to deploy the concept of audio inpainting within the research community by proposing new approaches, by addressing new applications and by creating and animating a dedicated research network; b) to initiate works on time-frequency inpainting, i.e. on the reconstruction of missing coefficients in a transform domain; c) to expand the concept of and the techniques for audio inpainting by developing connections with machine learning.

The project establishes strong relations between signal processing and machine learning. It does not only consist in applying machine learning techniques to signals but also deals with a machine learning formulation of signal processing problems and with the integration of computational trade-offs in algorithms. The project also draws connections between audio and image processing. The proposal implies close interactions between theory and applications with top/down and bottom/up relations. All those original aspects are revealed in the composition of the team and are expected to result in powerful approaches to real applications.

The MAD proposal is submitted to the ANR JCJC program under the leadership of Valentin Emiya, this proposal being the largest project he coordinates. To address its ambitious and diverse objectives, MAD involves a large team of 11 members, with research experiences in theory and application views from both academy and industry, signal processing and machine learning. Seven team members are located in the same site, the remaining four members being in three isolated distant sites including two members at Technicolor.

Valentin Emiya (Laboratoire d'Informatique Fondamentale de Marseille)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

LIF Laboratoire d'Informatique Fondamentale de Marseille

Help of the ANR 198,938 euros
Beginning and duration of the scientific project: September 2014 - 36 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.