Microsoft’s AI tool can turn photos into realistic videos of people talking and singing

remon4 weeks ago

134 2 minutes read

Microsoft Research Asia has unveiled a new experimental AI tool called VASA-1 that can take a still image of a person – or a drawing of a person – and an existing audio file to create a realistic talking face in real time. It has the ability to generate facial expressions and head movements for an existing still image as well as the appropriate lip movements to match a speech or song. The researchers uploaded a ton of examples to the project page, and the results look good enough that they could fool people into thinking they’re real.

While the lip and head movements in the examples may still appear a bit robotic and out of sync upon closer inspection, it’s still clear that the technology could be misused to easily and quickly create fake videos of real ones. people. The researchers themselves are aware of this potential and have decided not to release “an online demo, API, product, additional implementation details, or any related offerings” until they are confident that their technology “will be used responsibly and in accordance with appropriate rules”. regulations.” They did not, however, specify whether they planned to implement certain safeguards to prevent bad actors from using them for nefarious purposes, such as creating deepfake pornography or disinformation campaigns.

The researchers believe their technology has many benefits despite its potential for misuse. They said it could be used to increase educational equity, as well as improve accessibility for people with communication difficulties, perhaps by giving them access to an avatar capable of communicating at their place. It can also provide companionship and therapeutic support to those who need it, they said, hinting that the VASA-1 could be used in programs offering access to AI characters that people can talk to.

According to the document released with the announcement, VASA-1 was trained on the VoxCeleb2 dataset, which contains “over a million utterances for 6,112 celebrities” taken from YouTube videos. Although the tool was trained on real faces, it also works on artistic photos like the Mona Lisa, which the researchers playfully combined with an audio file of Anne Hathaway’s viral rendition of the Lil Wayne film . Paparazzi. It’s so delicious that it’s worth a try, even if you doubt the usefulness of such technology.

This post contains affiliate links; If you click on such a link and make a purchase, we may earn a commission.

News Source : www.engadget.com
Gn tech

remon4 weeks ago

134 2 minutes read

remon

Related Articles

Dock-less Google Pixel Tablet’s pricing leaks

Ask Engineer Abdullah Al-Saeed | Part 2

Helldivers 2 Game Master Joel works to fix “inconsistent” liberation results, but lacking in-game communication tools is the bigger problem

Ahead of Diablo 4’s season of loot changes, Blizzard teases other system reworks “similar in scope” and says it’s “very open to revisiting other parts of the game”