Microsoft’s new AI voice model could be a deepfake game changer

The Azure AI Speech Private Voice characteristic has been upgraded to a brand new zero-shot TTS mannequin referred to as DragonV2.1Neural. As a zero-shot mannequin, it means voices will be created from minimal information. The brand new mannequin guarantees “extra natural-sounding and expressive voice” with “improved pronunciation accuracy and better controllability.”

The new mannequin can synthesize speech in over 100 languages with just some seconds of a voice pattern. The earlier DragonV1 mannequin had pronunciation challenges particularly with named entities.

The brand new mannequin can be utilized for a number of various purposes together with customizing chatbot voices and dubbing video content material in an actor’s authentic voice throughout a number of languages.

In accordance with Microsoft, DragonV2.1 brings enhancements to how pure the voices sound, “providing extra life like and steady prosody whereas sustaining higher pronunciation accuracy.” The mannequin additionally exhibits a median 12.8% relative Phrase Error Fee (WER) discount in comparison with DragonV1. When utilizing this mannequin, you may have fine-grained management over pronunciation and accent utilizing SSML phoneme tags and customized lexicons.

With this mannequin, Microsoft provides you management over the accent which is essential for speech and video translation, in addition to mimicking particular people. To assist customers get began, it has constructed a number of voice profiles resembling Andrew, Ava, and Brian that can assist you take a look at.

Microsoft’s new mannequin will increase the danger of deepfakes produced by malicious actors. To attempt to stop misuse, the corporate is asking customers to comply with utilization insurance policies, together with specific consent from the unique speaker, disclosing artificial content material, and prohibiting impersonation or deception.

The Redmond large may also robotically add watermarks to speech output. This expertise reaches 99.7% detection accuracy in varied audio modifying situations, which may assist to cut back the misuse of the AI voices.

You’ll be able to strive the non-public voice characteristic on Speech Studio as a take a look at, or apply for full entry to the API for enterprise use.

Picture by way of Depositphotos.com

Microsoft’s new AI voice model could be a deepfake game changer

Microsoft’s new AI voice model could be a deepfake game changer

No Comment! Be the first one.

Leave a Reply Cancel reply

related posts .

I review graphics cards for a living, and prices are as low as I’ve seen all year thanks to these Black Friday GPU deals

Upgrade your storage for less with this speedy Crucial 2TB NVMe SSD deal at Amazon

Recent Posts

I review graphics cards for a living, and prices are as low as I’ve seen all year thanks to these Black Friday GPU deals

Upgrade your storage for less with this speedy Crucial 2TB NVMe SSD deal at Amazon

If this Nvidia VRAM rumor is true, it’d be disastrous news for some graphics card makers – and all PC gamers

Tag Cloud

Type and hit Enter to search

Microsoft’s new AI voice model could be a deepfake game changer

Microsoft’s new AI voice model could be a deepfake game changer

No Comment! Be the first one.

Leave a Reply Cancel reply

related posts .

I review graphics cards for a living, and prices are as low as I’ve seen all year thanks to these Black Friday GPU deals

Upgrade your storage for less with this speedy Crucial 2TB NVMe SSD deal at Amazon

Recent Posts

I review graphics cards for a living, and prices are as low as I’ve seen all year thanks to these Black Friday GPU deals

Upgrade your storage for less with this speedy Crucial 2TB NVMe SSD deal at Amazon

If this Nvidia VRAM rumor is true, it’d be disastrous news for some graphics card makers – and all PC gamers

Tag Cloud

Enjoying my articles?

Sign up to get new content delivered straight to your inbox.