微软开源语音生成模型:vall-e(x)

We extend VALL-E to a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis, and train a multi-lingual conditional codec language model to predict the acoustic token sequences of the target language speech by using both the source language speech and the target language text as prompts. VALL-E X inherits strong in-context learning capabilities and can be applied for zero-shot cross-lingual text-to-speech synthesis and zero-shot speech-to-speech translation tasks. Experimental results show that it can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker’s voice, emotion, and acoustic environment. Moreover, VALL-E X effectively alleviates foreign accent problems, which can be controlled by a language ID.

This page is for research demonstration purposes only.

来自 <https://www.microsoft.com/en-us/research/project/vall-e-x/vall-e-x/>

Model Overview

VALL-E X can synthesize personalized speech in another language for a monolingual speaker. Taking the phoneme sequences derived from the source and target text, and the source acoustic tokens derived from an audio codec model as prompts, VALL-E X is able to produce the acoustic tokens in the target language, which can be then decompressed to the target speech waveform. Thanks to its powerful in-context learning capabilities, VALL-E X does not require cross-lingual speech data of the same speakers for training and can perform various zero-shot cross-lingual speech generation tasks, such as cross-lingual text-to-speech synthesis and speech-to-speech translation.

来自 <https://www.microsoft.com/en-us/research/project/vall-e-x/vall-e-x/>

测试模型:

https://www.microsoft.com/en-us/research/project/vall-e-x/vall-e-x/

微软并未开源此训练模型:但有人通过微软的论文训练了一个模型。

https://github.com/Plachtaa/VALL-E-X/blob/master/README-ZH.md

可以参照此模型实现这个训练效果。

https://plachtaa.github.io/

其他参考资料:

语音合成(TTS)开源调研与测评

来自 <https://zhuanlan.zhihu.com/p/687094556>

【懒人包】只需要三秒录音,即可克隆你的声音

来自 <https://zhuanlan.zhihu.com/p/669184674>

Github开源项目精选-9月第二周Top15???

来自 <https://zhuanlan.zhihu.com/p/655091814>