Share this article
Latest news
With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low
Copilot in Outlook will generate personalized themes for you to customize the app
Microsoft will raise the price of its 365 Suite to include AI capabilities
Death Stranding Director’s Cut is now Xbox X|S at a huge discount
Outlook will let users create custom account icons so they can tell their accounts apart easier
Microsoft’s VALL-E AI can learn your speech patterns in 3 seconds
2 min. read
Published onJanuary 11, 2023
published onJanuary 11, 2023
Share this article
Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more
AI and text-to-speech seem to be the spark in early 2023. Microsoft researchers have announced a new text-to-speech AI model called VALL-E that can simulate a person’s voice with just a three-second audio sample. Once VALL-E learns a specific voice it can synthesize the audio of that person and keep their emotional tone.
VALL-E could be used for high-quality text-to-speech applications where changing the text transcript could allow the recording of a person to be edited to say something they originally didn’t. Microsoft calls VALL-E a “neural codec language model” that builds off a technology called EnCodec. VALL-E is different than other text-to-speech methods in that instead of synthesizing the speech by manipulating the waveforms, VALL-E generates discrete audio codec codes from text and acoustic prompts. It uses EnCodec to break that information down into discrete components called tokens and matches training data and what it “knows” about a person’s voice to determine how it might sound with spoken phrases.
VALL-E was trained on an audio library assembled by Meta calledLibriLightcontaining 60,000 hours of English language speech from more than 7,000 speakers, most were pulled fromLibriVoxpublic domain audiobooks allowing a good result with just a three-second sample.
Microsoft has set up a VALL-Eexample websiteso you can get a taste of the technology using dozens of audio examples of the AI model in action.
ViaArstechnica.com
David Allen
User forum
0 messages
Sort by:LatestOldestMost Votes
Comment*
Name*
Email*
Commenting as.Not you?
Save information for future comments
Comment
Δ
David Allen