Personalized Dereverberation of Speech

Supplementary Material

Video Demonstration

In this video we demonstrate the one-time personalization process introduced in the paper. We measure the representative room impulse response (rRIR) and then capture a few minutes of speech. Following personalization, we show an online conversation between a male and a female in two different environments. Our processing achieves both dereverberation and denoising resulting in better speech clarity.

Dereverberation Comparison

Methods Blind ML Male 1 Female 1 Male 2 Female 2 Comments
Recorded - -

Real recordings of speech by a male and a female in different environments.

Wiener

Wiener struggles at dereverberation and creates strong artifacts due to imprecise RIR.

WPE

WPE produces fewer artifacts, but achieves minimal dereverberation.

Demucs

Demucs is designed primarily for audio denoising. When trained with reverberation data, it fails to handle complex RIRs.

Demucs
(pretrained)

Pre-trained Demucs results in denoising but minimal dereverberation.

HiFi-GAN

HiFi-GAN, trained on our data, produces inadequate dereverberation and clarity of speech.

HiFi-GAN
(pretrained)

Pre-trained HiFi-GAN produces inadequate dereverberation and clarity of speech as well.

Audition
(100% Dereverb)

Adobe Audition produces some dereverberation but distorts the audio.

Ours

Our method produces results almost indistinguishable from clean recordings.

Clean - -

Reference speech recorded using a lavalier microphone worn by the user.