VALL-E

VALL-E is a neural codec language model using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as a prompt.

网站域名：www.microsoft.com 更新日期：2024-05-28 网站简称：VALL-E 网站分类：AI语音克隆人气指数：424

进入网站同类网站