AI researchers from Meta Platforms Inc. and Alphabet Inc.’s Google have taken an extraordinary leap forward.
AI’s creative abilities are outstripping its driving skills. While self-driving car technology is going nowhere, there’s been a remarkable explosion in research around generative models, or artificial intelligence systems that can create images from simple text. In just the past week, AI researchers from Meta Platforms Inc. and Alphabet Inc.’s Google have taken an extraordinary leap forward, developing systems that can generate videos with just about any text prompt one can imagine.
The videos from Facebook-parent Meta look like trippy dream sequences, showing a teddy bear painting flowers or a horse with distended legs galloping over a field. They last about one or two seconds and have a glitchy quality that betrays their source, but they’re still remarkable. The videos generated by Google, of coffee being poured into a cup or a flight over a snowy mountain, look especially realistic.
Google has also built an even more impressive second system called Phenaki that can create longer videos, lasting two minutes or more. Here’s an example of the prompt Google used for one:
‘Lots of traffic in futuristic city. An alien spaceship arrives to the futuristic city. The camera gets inside the alien spaceship. The camera moves forward until showing an astronaut in the blue room. The astronaut is typing in the keyboard. The camera moves away from the astronaut. The astronaut leaves the keyboard and walks to the left…”
That’s less than a third of the entire prompt, which reads almost like a movie script with commands such as “camera zooms in.” And here’s the resulting clip, posted on Twitter by Dumitru Erhan, one of Phenaki’s creators at Google Brain:
You may be thinking this is the end of Hollywood as we know it or that anyone with a few brain cells and a computer will soon be able to produce feature-length films. That’s actually along the lines of what the researchers are hoping for. Erhan tweeted that he and his team wanted to empower people to “create their own visual stories… [to] make creativity for people easier.”
It’s hard to see AI-generated videos coming to your local movie theater any time soon. But we’ll almost certainly see them being posted in our social media feeds, particularly on platforms like ByteDance Ltd.’s TikTok, Instagram’s Reels or YouTube.
TikTok didn’t respond to a question about whether it is building its own AI video-generation tool, but it would make sense for the platform to do so. TikTok’s users love adding stickers, text and green screens to their posts, and the platform accommodates the demand with new tech. In August, it added an AI image generator to its app for creating stylized green screens. Type in a prompt like “Boris Johnson” and TikTok will bring up an abstract image vaguely reminiscent of the former British prime minister.
What happens when machines not only recommend the videos that keeps us scrolling, but have a greater hand in creating them too? Many of us love to watch footage of cute cats and people tripping over themselves, so an algorithm that could produce fake montages of awkward stumbles or frisky kittens would attract viral hits with little work, so long as they appear real.
Content creators on TikTok, and the platforms themselves, have every incentive to exploit a tool that can generate videos at scale, especially when it’s cheap and easy. For the rest of us, the result would be social media feeds that are more machine-driven than ever. Already powered by AI and recommendation algorithms, AI videos would add to the self-reinforced feedback loops that scratch our cognitive itches.
The other looming consequence is a flood of misinformation, but there may be less need for alarm on that in the short term. Social media platforms have been heightening their efforts to weed out fake content, and both Google and Facebook are refusing to release their video-making tools to the public because of the potential for misuse (and presumably bad public relations). Google said that its own system generated videos that were biased against women even when they tried filtering out stereotyped results. The model or its source code won’t be released until the problem is fixed, Google researchers said.
Of course, soon enough you’ll be able to use these tools with few restrictions anyway, thanks to organizations like Stability AI. The British startup released an image-generation tool last August that allowed anyone to generate cool art, as well as fake photos of celebrities, politicians and war zones, something larger AI companies have banned. I tried the tool and, in seconds, was able to cook up photos of former President Donald Trump playing golf with North Korean leader Kim Jong Un. Stability is working on a video-generation tool that it plans to release publicly when it’s ready.
But while greater access to such tools will lead to more fake content, it will also mean more people are aware that the tools exist. They’re more likely to suspect that “photo” of President Joe Biden punching an old lady is AI-generated. That’s the hope, anyway.
Just as concerning is what these tools will do to people’s daily diets of content. Google’s researchers contend that their tools will augment human creativity. But when it becomes so easy to make video that you barely have to think about it, is that really harnessing our imagination? Maybe not in every case.
Coupled with the recommendation engines that drive so much of what we seen online, geared toward generating clicks, that makes our future look much more machine-directed — and, arguably, not very creative.
Parmy Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of “We Are Anonymous.”