Google’s new AI turns text into music
Google researchers have created an AI that can generate minute-long pieces of music from text prompts, and can even transform a whistled or hummed melody into other instruments, similar to systems like DALL- E generate images from written prompts (via TechCrunch). The template is called MusicLM, and while you can’t play with it itself, the company has uploaded a bunch of samples it produced using the template.
The examples are impressive. There are 30-second snippets of what sound like real songs created from paragraph descriptions that prescribe a specific genre, mood, and even instruments, as well as five-minute tracks generated from one or two words like “melodic techno”. “Perhaps my favorite is a “story mode” demo, where the model is basically given a script to morph between prompts. For example, this prompt:
electronic song played in a video game (0:00-0:15)
meditation song played beside a river (0:15-0:30)
Resulted in audio which you can listen to here.
It might not be for everyone, but I could totally see this was composed by a human (I also listened to it on repeat dozens of times while writing this article). The demo site also features examples of what the model produces when asked to generate 10-second clips of instruments like the cello or maracas (the last example is where the system does a relatively poor job) , eight-second clips of a certain genre, the music that would be appropriate for a prison break, and even what a beginner pianist would sound like compared to an advanced pianist. It also includes renditions of phrases such as “futuristic club” and “death metal accordion”.
MusicLM can even simulate the human voice, and while it seems to get the overall tone and sound of voices, there is one quality that is definitely wrong. The best way I can describe it is that they sound grainy or static. This quality isn’t as clear in the example above, but I think this one illustrates it quite well.
By the way, this is the result of asking him to make music that would play in a gym. You might also have noticed that the lyrics are nonsense, but in a way that you won’t necessarily grasp if you’re not paying attention – kind of like listening to someone sing in simlish or that song that’s supposed to sound like to English but is not.
I won’t pretend to know How? ‘Or’ What Google got these results, but they published a research paper explaining it in detail if you’re the kind of person who would understand this number:
AI-generated music has a long history stretching back decades; there are systems that have been credited with composing pop songs, copying Bach better than a human could in the 90s, and accompanying live performances. A recent version uses the StableDiffusion AI image generation engine to turn text prompts into spectrograms which are then turned into music. The article states that MusicLM can outperform other systems in terms of “quality and respect for the legend”, as well as the fact that it can take audio and copy the melody.
This last part is perhaps one of the coolest demos from the researchers. The site lets you play the input audio, where someone hums or whistles a tune, then lets you hear how the model plays it back as an electronic synth lead, string quartet, guitar solo, etc From the examples I’ve listened to, it handles the task very well.
As with other forays into this type of AI, Google is far more careful with MusicLM than some of its peers with similar technology. “We do not intend to release templates at this time,” the document concludes, citing the risks of “potential misappropriation of creative content” (read: plagiarism) and potential cultural appropriation or misrepresentation.
It’s still possible the tech will show up in one of Google’s fun music experiences at some point, but for now the only people who will be able to use the search are other people building music AI systems. . Google says it publicly releases a dataset with around 5,500 music-text pairs, which could help when training and evaluating other music AIs.