arXiv 2110.13900

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

By Sanyuan Chen, Chengyi Wang, et al.

Published 2021-10-26

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. To tackle the problem, we propose a new pre-trained model, WavLM, to so…

View the original paper on arXiv