Technology

New AI model can hallucinate a game of 1993’s Doom in real time

New AI model can hallucinate a game of 1993’s Doom in real time

Aurich Lawson | Getty Images

On Tuesday, researchers from Google and Tel Aviv University unveiled GameNGen, a new AI model capable of interactively simulating the classic 1993 first-person shooter. Loss in real time using AI image generation techniques borrowed from Stable Diffusion. It is a neural network system that can function as a limited game engine, potentially opening up new possibilities for real-time video game synthesis in the future.

For example, instead of drawing graphical video frames using traditional techniques, future games could potentially use an AI engine to “imagine” or hallucinate graphics in real-time as a prediction task.

“The potential here is absurd,” wrote app developer Nick Dobos in response to the news. “Why handwrite complex rules for software when AI can just think of every pixel for you?”

GameNGen would be able to generate new images of Loss gameplay at over 20 frames per second using a single tensor processing unit (TPU), a type of specialized processor similar to a GPU optimized for machine learning tasks.

In tests, the researchers say that ten human raters sometimes failed to distinguish between short clips (1.6 seconds and 3.2 seconds) of real videos. Loss gameplay sequences and outputs generated by GameNGen, identifying real gameplay sequences in 58 or 60% of cases.

An example of GameNGen in action, interactively simulating Doom using a CGI model.

Real-time video game synthesis using what might be called “neural rendering” isn’t a completely new idea. Nvidia CEO Jensen Huang predicted in an interview in March, perhaps a bit boldly, that most video game graphics could be generated by AI in real time within five to ten years.

GameNGen also builds on previous work in the field, cited in the GameNGen paper, which includes World Models in 2018, GameGAN in 2020, and Google’s Genie in March. And a group of university researchers trained an AI model (called “DIAMOND”) to simulate vintage Atari video games using a diffusion model earlier this year.

Additionally, ongoing research into “world models” or “world simulators,” commonly associated with AI video synthesis models like Runway’s Gen-3 Alpha and OpenAI’s Sora, is leaning in a similar direction. For example, when Sora was launched, OpenAI showed demo videos of the AI ​​generator simulating Minecraft.

Diffusion is the key

In a preprint research paper titled “Diffusion Models Are Real-Time Game Engines,” authors Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter explain how GameNGen works. Their system uses a modified version of Stable Diffusion 1.4, a synthetic image diffusion model released in 2022 that people use to produce AI-generated images.

“It turns out that the answer to the question ‘can it work’ LOSS“?” is yes for diffusion models,” wrote Stability AI research director Tanishq Mathew Abraham, who was not involved in the research project.

A diagram provided by Google of the GameNGen architecture.
Enlarge / A diagram provided by Google of the GameNGen architecture.

While being driven by player inputs, the diffusion model predicts the next game state from previous ones after being trained on extended sequences of Loss in action.

The development of GameNGen involved a two-phase training process. First, the researchers trained a reinforcement learning agent to play Losswith their recorded gaming sessions to create an automatically generated training dataset, those images we mentioned. They then used this data to train the custom stable diffusion model.

However, using Stable Diffusion introduces some graphical issues, as the researchers note in their abstract: “The pre-trained autoencoder of Stable Diffusion v1.4, which compresses 8×8 pixel patches into 4 latent channels, generates significant artifacts when predicting game frames, affecting small details and especially the bottom bar HUD.”

An example of GameNGen in action, interactively simulating Doom using a CGI model.

And that’s not the only challenge. Keeping frames visually clear and consistent over time (often called “temporal consistency” in the AI ​​video space) can be a challenge. GameNGen researchers say that “interactive world simulation is much more than just very fast video generation,” as they write in their paper. “The requirement to condition on an input action stream that is only available throughout the generation breaks some assumptions of existing streaming model architectures,” including repeatedly generating new frames based on previous ones (called “autoregression”), which can lead to instability and a rapid decline in the quality of the generated world over time.

Back to top button