Personalized Dereverberation of Speech

Video Demonstration

In this video we demonstrate the one-time personalization process introduced in the paper. We measure the representative room impulse response (rRIR) and then capture a few minutes of speech. Following personalization, we show an online conversation between a male and a female in two different environments. Our processing achieves both dereverberation and denoising resulting in better speech clarity.

Dereverberation Comparison

Methods	Blind	ML	Comments
`Recorded`	-	-	Real recordings of speech by a male and a female in different environments.
`Wiener`	✘	✘	`Wiener` struggles at dereverberation and creates strong artifacts due to imprecise RIR.
`WPE`	✔	✘	`WPE` produces fewer artifacts, but achieves minimal dereverberation.
`Demucs`	✔	✔	`Demucs` is designed primarily for audio denoising. When trained with reverberation data, it fails to handle complex RIRs.
`Demucs (pretrained)`	✔	✔	Pre-trained `Demucs` results in denoising but minimal dereverberation.
`HiFi-GAN`	✔	✔	`HiFi-GAN`, trained on our data, produces inadequate dereverberation and clarity of speech.
`HiFi-GAN (pretrained)`	✔	✔	Pre-trained `HiFi-GAN` produces inadequate dereverberation and clarity of speech as well.
`Audition (100% Dereverb)`	✔	✔	`Adobe Audition` produces some dereverberation but distorts the audio.
`Ours`	✘	✔	`Our` method produces results almost indistinguishable from `clean` recordings.
`Clean`	-	-	Reference speech recorded using a lavalier microphone worn by the user.

Personalized Dereverberation of Speech

Supplementary Material

Video Demonstration

Dereverberation Comparison