Unsupervised Speech Enhancement
Speech enhancement systems are built to remove background noise and reverberation from speech signals. It can be applied in video conferencing systems, virtual assistants, hearing aids, mobile, smart home devices, etc…Conventional speech enhancement systems are trained with supervised learning methods, which require a pair of studio-quality clean target speech and synthetic noisy mixture. The requirement of a ground truth clean speech dataset has disadvantages because it is harder to scale and not diverse enough, which makes the trained model not robust to real-world scenarios. Moreover, it is expensive to record clean speech and noise on the same domain with inference data. Additionally, conventional speech enhancement systems can lead to automatic speech recognition (ASR) performance degradation.
Our project goals are to improve the perceptual quality of enhanced speech, utilize abundant real noisy recording instead of relying on expensive studio-quality data and mitigate the problem of ASR performance degradation when utilizing speech enhancement. To fulfill our goals, we proposed a weighted loss function that combines speech recognition embedding and disentanglement related losses with MixIT loss, a recently introduced unsupervised loss for speech separation. In addition, we evaluate the amount of added noise to the performance of the unsupervised speech enhancement system. We also investigate how to enforce disentanglement between speech and noise to get the best performance.
Our results show that the proposed loss function has successfully improved perceptual quality of speech and reduced speech recognition error rate on the noisy dataset VoxCeleb. We also find that enforce disentanglement of speech and noise at ASR embedding level achieve a better result than at spectrogram level.
Видео Unsupervised Speech Enhancement канала Microsoft Research
Our project goals are to improve the perceptual quality of enhanced speech, utilize abundant real noisy recording instead of relying on expensive studio-quality data and mitigate the problem of ASR performance degradation when utilizing speech enhancement. To fulfill our goals, we proposed a weighted loss function that combines speech recognition embedding and disentanglement related losses with MixIT loss, a recently introduced unsupervised loss for speech separation. In addition, we evaluate the amount of added noise to the performance of the unsupervised speech enhancement system. We also investigate how to enforce disentanglement between speech and noise to get the best performance.
Our results show that the proposed loss function has successfully improved perceptual quality of speech and reduced speech recognition error rate on the noisy dataset VoxCeleb. We also find that enforce disentanglement of speech and noise at ASR embedding level achieve a better result than at spectrogram level.
Видео Unsupervised Speech Enhancement канала Microsoft Research
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
The Intern Experience at Microsoft Research CambridgeBattling Tuberculosis Using Microsoft TechnologyFast and Flexible Multi-Task Classification Using Conditional Neural Adaptive ProcessesData Visualization Reaches New Heights with LayerscapeFaculty Summit 2018 IntroductionMicrosoft Pix - Take Better Photos of People, AutomaticallyImprovements on Higher Order Ambisonics ReproductionCambridge lab overview with Chris BishopGet free cloud computing time and storage on Microsoft AzureUnlocking Real world solutions with AI – Chris BishopHow interns at our New England Lab impact research at MicrosoftAI Institute "Geometry of Deep Learning" 2019 [Workshop] Day 1 | Session 2De-Identifying Healthcare Data for ResearchIntelligent cloud computing lifts villages out of water povertyHow Microsoft and Novartis created Assess MS (short version)How OSIsoft and Deschutes Brewery used Microsoft Security Risk DetectionChronoZoom curriculum and technologyRecent Advances in Image Captioning, Image-Text Retrieval and…Using machine learning and AI to reduce hospital readmissionsIROS 2020 - Mixed Reality and Robotics Tutorial - Demo 1: InteractionCamera-based non-contact health sensing