In this video we demonstrate the one-time personalization process introduced in the paper. We measure the representative room impulse response (rRIR) and then capture a few minutes of speech. Following personalization, we show an online conversation between a male and a female in two different environments. Our processing achieves both dereverberation and denoising resulting in better speech clarity.
Methods | Blind | ML | Male 1 | Female 1 | Male 2 | Female 2 | Comments |
---|---|---|---|---|---|---|---|
Recorded | - | - |
Real recordings of speech by a male and a female in different environments. |
||||
Wiener | ✘ | ✘ |
Wiener struggles at dereverberation and creates strong artifacts due to imprecise RIR. |
||||
WPE | ✔ | ✘ |
WPE produces fewer artifacts, but achieves minimal dereverberation. |
||||
Demucs | ✔ | ✔ |
Demucs is designed primarily for audio denoising. When trained with reverberation data, it fails to handle complex RIRs. |
||||
Demucs (pretrained) |
✔ | ✔ |
Pre-trained Demucs results in denoising but minimal dereverberation. |
||||
HiFi-GAN | ✔ | ✔ |
HiFi-GAN, trained on our data, produces inadequate dereverberation and clarity of speech. |
||||
HiFi-GAN (pretrained) |
✔ | ✔ |
Pre-trained HiFi-GAN produces inadequate dereverberation and clarity of speech as well. |
||||
Audition (100% Dereverb) |
✔ | ✔ |
Adobe Audition produces some dereverberation but distorts the audio. |
||||
Ours | ✘ | ✔ |
Our method produces results almost indistinguishable from clean recordings. |
||||
Clean | - | - |
Reference speech recorded using a lavalier microphone worn by the user. |