Technology

The Mona Lisa rapping? New Microsoft AI animates faces from photos

I Ryu/Visual China Group/Getty Images

A Microsoft sign is seen at the company’s headquarters on March 19, 2023 in Seattle, Washington.


new York
CNN

The Mona Lisa can now do more than smile, thanks to new artificial intelligence technology from Microsoft.

Last week, Microsoft researchers detailed a new AI model they developed that can take a still image of a face and an audio clip of a person speaking and automatically create a realistic video of that speaker. The videos – which can be made from photorealistic faces, as well as cartoons or artwork – are complemented by convincing lip-syncing and natural facial and head movements.

In a demonstration video, the researchers showed how they animated the Mona Lisa to recite a comedic rap by actress Anne Hathaway.

The results of the AI ​​model, called VASA-1, are both entertaining and somewhat shocking in their reality. Microsoft said the technology could be used for educational purposes or “to improve accessibility for people with communication challenges”, or potentially to create virtual companions for humans. But it’s also easy to see how the tool could be misused and used to impersonate real people.

It’s a concern that extends beyond Microsoft: As new tools for creating compelling AI-generated images, videos, and audio emerge, experts worry that their misuse could lead to new forms of misinformation . Some also worry that technology will further disrupt creative industries, from film to advertising.

For now, Microsoft said it does not plan to release the VASA-1 model to the public immediately. The move is similar to how Microsoft partner OpenAI is handling concerns about its AI-generated video tool, Sora: OpenAI teased Sora in February, but has until now made it available only some professional users and cybersecurity educators for testing purposes.

“We oppose any behavior that creates content that is misleading or harmful to real people,” Microsoft researchers said in a blog post. But, they added, the company has “no plans to make the product public” until we are certain that the technology will be used responsibly and in accordance with appropriate regulations.

Microsoft’s new AI model was trained on numerous videos of people’s faces speaking, and it is designed to recognize natural facial and head movements, including “lip movement, expression (without lips), gaze and blinks, among others,” the researchers said. The result is a more realistic video when VASA-1 animates a still photo.

For example, in a demo video featuring a clip of a person appearing agitated, apparently playing video games, the speaking face has furrowed brows and pursed lips.

The AI ​​tool can also be geared to produce a video in which the subject looks in a certain direction or expresses a specific emotion.

Upon closer inspection, there are still signs that the videos are machine-generated, such as infrequent blinking and exaggerated eyebrow movements. But Microsoft said it believes its model “significantly outperforms” other similar tools and “paves the way for real-time engagements with realistic avatars that mimic human conversational behaviors.”

News Source : amp.cnn.com
Gn tech

Back to top button