Share this article

Latest news

With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low

Copilot in Outlook will generate personalized themes for you to customize the app

Microsoft will raise the price of its 365 Suite to include AI capabilities

Death Stranding Director’s Cut is now Xbox X|S at a huge discount

Outlook will let users create custom account icons so they can tell their accounts apart easier

Microsoft’s VALL-E AI can learn your speech patterns in 3 seconds

2 min. read

Published onJanuary 11, 2023

published onJanuary 11, 2023

Share this article

Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more

AI and text-to-speech seem to be the spark in early 2023. Microsoft researchers have announced a new text-to-speech AI model called VALL-E that can simulate a person’s voice with just a three-second audio sample.  Once VALL-E learns a specific voice it can synthesize the audio of that person and keep their emotional tone.

VALL-E could be used for high-quality text-to-speech applications where changing the text transcript could allow the recording of a person to be edited to say something they originally didn’t.  Microsoft calls VALL-E a “neural codec language model” that builds off a technology called EnCodec.  VALL-E is different than other text-to-speech methods in that instead of synthesizing the speech by manipulating the waveforms, VALL-E generates discrete audio codec codes from text and acoustic prompts.  It uses EnCodec to break that information down into discrete components called tokens and matches training data and what it “knows” about a person’s voice to determine how it might sound with spoken phrases.

VALL-E was trained on an audio library assembled by Meta calledLibriLightcontaining 60,000 hours of English language speech from more than 7,000 speakers, most were pulled fromLibriVoxpublic domain audiobooks allowing a good result with just a three-second sample.

Microsoft has set up a VALL-Eexample websiteso you can get a taste of the technology using dozens of audio examples of the AI model in action.

ViaArstechnica.com

David Allen

User forum

0 messages

Sort by:LatestOldestMost Votes

Comment*

Name*

Email*

Commenting as.Not you?

Save information for future comments

Comment

Δ

David Allen