Microsoft’s new AI voice model could be a deepfake game changer


                            Microsoft's New Ai Voice Model Could Be a Deepfake Game Changer

Microsoft’s new AI voice model could be a deepfake game changer

Home » News » Microsoft’s new AI voice model could be a deepfake game changer
Table of Contents
The Microsoft logo

The Azure AI Speech Private Voice characteristic has been upgraded to a brand new zero-shot TTS mannequin referred to as DragonV2.1Neural. As a zero-shot mannequin, it means voices will be created from minimal information. The brand new mannequin guarantees “extra natural-sounding and expressive voice” with “improved pronunciation accuracy and better controllability.”

The new mannequin can synthesize speech in over 100 languages with just some seconds of a voice pattern. The earlier DragonV1 mannequin had pronunciation challenges particularly with named entities.

The brand new mannequin can be utilized for a number of various purposes together with customizing chatbot voices and dubbing video content material in an actor’s authentic voice throughout a number of languages.

In accordance with Microsoft, DragonV2.1 brings enhancements to how pure the voices sound, “providing extra life like and steady prosody whereas sustaining higher pronunciation accuracy.” The mannequin additionally exhibits a median 12.8% relative Phrase Error Fee (WER) discount in comparison with DragonV1. When utilizing this mannequin, you may have fine-grained management over pronunciation and accent utilizing SSML phoneme tags and customized lexicons.

With this mannequin, Microsoft provides you management over the accent which is essential for speech and video translation, in addition to mimicking particular people. To assist customers get began, it has constructed a number of voice profiles resembling Andrew, Ava, and Brian that can assist you take a look at.

Microsoft’s new mannequin will increase the danger of deepfakes produced by malicious actors. To attempt to stop misuse, the corporate is asking customers to comply with utilization insurance policies, together with specific consent from the unique speaker, disclosing artificial content material, and prohibiting impersonation or deception.

The Redmond large may also robotically add watermarks to speech output. This expertise reaches 99.7% detection accuracy in varied audio modifying situations, which may assist to cut back the misuse of the AI voices.

You’ll be able to strive the non-public voice characteristic on Speech Studio as a take a look at, or apply for full entry to the API for enterprise use.

Picture by way of Depositphotos.com

author avatar
roosho Senior Engineer (Technical Services)
I am Rakib Raihan RooSho, Jack of all IT Trades. You got it right. Good for nothing. I try a lot of things and fail more than that. That's how I learn. Whenever I succeed, I note that in my cookbook. Eventually, that became my blog. 

share this article.

Enjoying my articles?

Sign up to get new content delivered straight to your inbox.

Please enable JavaScript in your browser to complete this form.
Name