Business

OpenAI’s voice cloning AI model only needs a 15-second sample to work

OpenAI offers limited access to a text-to-speech generation platform it developed, called Voice Engine, which can create a synthetic voice based on a 15-second snippet of a person’s voice. The AI-generated voice can read text prompts on command in the same language as the speaker or in a number of other languages. “These small-scale deployments help inform our approach, safeguards, and thinking about how Voice Engine could be put to good use across various industries,” OpenAI said in its blog post.

Companies with access include edtech company Age of Learning, visual storytelling platform HeyGen, frontline healthcare software maker Dimagi, AI communications app maker Livox and health system Lifespan.

In these examples published by OpenAI, you can hear what Age of Learning has done with the technology to generate pre-scripted voiceover content, as well as read “real-time personalized responses” to students written by GPT-4 . .

First, the English reference audio:

And here are three AI-generated audio clips based on this sample,

OpenAI said it began developing Voice Engine in late 2022 and that the technology has already powered predefined voices for ChatGPT’s text-to-speech API and read aloud feature. In an interview with TechCrunch, Jeff Harris, a member of OpenAI’s product team for Voice Engine, said the model was trained on “a mix of licensed and publicly available data.” OpenAI told the publication that the model would only be available to around ten developers.

AI text-to-audio generation is an area of ​​generative AI that continues to evolve. While most focus on instrumental or natural sounds, fewer have focused on voice generation, in part because of the issues cited by OpenAI. Some names in the field include companies like Podcastle and ElevenLabs, which provide AI voice cloning technology and tools. Edgecast explored last year.

According to OpenAI, its partners have agreed to respect its usage policies which state that they will not use Voice Generation to impersonate people or organizations without their consent. It also requires that partners obtain “explicit and informed consent” from the original speaker, not create means for individual users to create their own voices, and disclose to listeners that the voices are generated by AI. OpenAI also added a watermark to audio clips to trace their origin and actively monitor how the audio is used.

OpenAI suggested several measures that it said could limit the risks associated with tools like these, including phasing out voice authentication to access bank accounts, policies to protect the use of voice of people in AI, better education about AI deepfakes, and the development of tracking systems. of AI content.

News Source : www.theverge.com
Gn bussni

Sara Adm

Aimant les mots, Sara Smith a commencé à écrire dès son plus jeune âge. En tant qu'éditeur en chef de son journal scolaire, il met en valeur ses compétences en racontant des récits impactants. Smith a ensuite étudié le journalisme à l'université Columbia, où il est diplômé en tête de sa classe. Après avoir étudié au New York Times, Sara décroche un poste de journaliste de nouvelles. Depuis dix ans, il a couvert des événements majeurs tels que les élections présidentielles et les catastrophes naturelles. Il a été acclamé pour sa capacité à créer des récits captivants qui capturent l'expérience humaine.
Back to top button