OpenAI’s New Product Makes Incredibly Realistic Fake Videos
17:07 minutes
Listen to this story and more on Science Friday’s podcast.
OpenAI, the company behind the chatbot ChatGPT and the image generator DALL-E, unveiled its newest generative AI product last week, called Sora, which can produce extremely realistic video from just a text prompt. In one example released by the company, viewers follow a drone’s-eye view of a couple walking hand-in-hand through snowy Tokyo streets. In another, a woman tosses and turns in bed as her cat paws at her. Unless you’re an eagle-eyed AI expert, it’s nearly impossible to distinguish these artificial videos from those shot by a drone or a smartphone.
Unlike previous OpenAI products, Sora won’t be released right away. The company says that for now, its latest AI will only be available to researchers, and that it will gather input from artists and videographers before it releases Sora to the wider public.
But the fidelity of the videos prompted a polarizing response on social media. Some marveled at how far the technology had come while others expressed alarm at the unintended consequences of releasing such a powerful product to the public—especially during an election year.
Rachel Tobac, an ethical hacker and CEO of SocialProof Security, joins guest host Sophie Bushwick to talk about Sora and what it could mean for the rest of us.
Rachel Tobac is an ethical hacker and the CEO of SocialProof Security in San Francisco, California.
KATHLEEN DAVIS: This is Science Friday. I’m Kathleen Davis.
SOPHIE BUSHWICK: And I’m Sophie Bushwick. OpenAI, that’s the company behind the chatbot ChatGPT and the image generator DALL-E, has unveiled its newest generative AI product called Sora. This model can produce extremely realistic-looking video from just a text prompt.
In one example, we follow a drone’s eye view of a couple walking hand in hand through snowy Tokyo streets. In another, a woman tosses and turns in bed as her cat paws at her. Unless you are an eagle-eyed AI expert, it’s nearly impossible to distinguish these artificial videos from real ones shot on a drone or a smartphone.
Unlike previous OpenAI products, Sora won’t be released right away. The company says that for now its latest AI will only be available to researchers, and that it will gather input from artists and videographers before it releases Sora to the wider public. But the fidelity of the videos prompted a polarizing response on social media. Some are marveling at how far the technology has come while others are expressing alarm at the unintended consequences of releasing such a powerful product to the public, especially during an election year.
Here to talk about Sora and what it could mean for the rest of us is Rachel Tobac, an ethical hacker and CEO of SocialProof Security. Welcome to Science Friday.
RACHEL TOBAC: Thanks for having me.
SOPHIE BUSHWICK: Rachel, what was your reaction when you first saw these videos?
RACHEL TOBAC: My first thought was, whoa, I am super impressed with how realistic they are, followed very closely by my second thought of, oh, this is going to impact a lot. I mean, just last year we saw Will Smith eating spaghetti, and it looked so far from real life it was hard to imagine how long it would take to get.
SOPHIE BUSHWICK: It was terrible.
RACHEL TOBAC: Yeah. [LAUGHS] Yeah.
SOPHIE BUSHWICK: It was very uncanny valley creepy. Yes.
RACHEL TOBAC: So creepy. And people would ask me, how long do you think it’s going to take to get from Will Smith eating spaghetti to an actual good realistic video? And I said maybe two years. And I was wrong. It took one.
SOPHIE BUSHWICK: Wow. Yeah. And what are some of the potential risks of giving a lot of people access to this kind of technology?
RACHEL TOBAC: So the issue with AI-generated video is that there are, unfortunately, many ways that adversaries can use that content to manipulate us, confuse us, fish us, and just harm people in general. Now, we’ve seen OpenAI already talk about this. They said we have rules in place with things like, there’s no violence allowed, no celebrity likeness, et cetera. But adversaries can quickly find ways to use a tool like Sora within the rules of the tool to still trick or harm people.
I’ll give you a few examples here. So imagine an adversary uses an AI video generation tool to show unimaginably long lines– we’re talking hundreds upon hundreds of people– in terrible weather to convince people it’s not worth it to head out to vote that day. You could prompt the tool to make a video that doesn’t break the rules, but it’s all in how it’s used on social media.
Another example is imagine an adversary creates a video of a large group of people wearing suits, waiting for an ATM outside of a bank. This could induce panic for markets or cause a bank run, like what we saw last year with some of the photos that introduced some chaos as it related to banking.
Another example is an adversary uses an AI video generation tool to show a human arm that has turned dark green after a medical procedure at a clinic. They could use this video on social media to trick others into avoiding care during something like a public health emergency. So the prompts themselves might be innocuous, but these, quote unquote, “rule-following AI videos” can still trick people, destabilize public health, elections, banking confidence and more.
And it’s great that OpenAI is red-teaming Sora with experts. That’s a great thing. But without addressing how the videos will be used, labeled, managed on social media, the team won’t be able to actually impact where the true risk starts, which is how they’re used to trick people on social media.
SOPHIE BUSHWICK: And when you say red-teaming, what do you mean by that?
RACHEL TOBAC: Sure. Red-teaming is when a group of experts act as adversaries to test out how malicious folks will attempt to manipulate a tool. It’s fantastic. I do red-teaming all the time as an ethical hacker. And emulating a real threat on a platform is essential to understanding how it will be abused.
But we have to also address how the adversaries will actually leverage the AI-generated videos on social media because that’s where the true harm and confusion takes place, not just within the platform itself.
SOPHIE BUSHWICK: And we’ve actually seen some of those harms from much less powerful technology used for misinformation or hacks. So bad actors might share a clip from a video game and claim that that’s actually a real video showing real violence. And you yourself have demonstrated how deepfake voices can be used to scam people.
So what I’m wondering is, how does this AI-generated video change things? What’s something that an attacker could do with a model like Sora that wouldn’t have been possible with other technology?
RACHEL TOBAC: Oh, 100%. A question that I keep getting asked is, OK, but how is this different from Photoshop, because people have been getting tricked with Photoshop for years. The thing that I want people to understand is that video feels a lot harder to fake for our brains. The way that our brain interprets video information is, hmm, this is probably more likely true than not because it’s so hard to, quote, “fake a video,” end quote. Now we know that’s not really the case anymore.
But when Photoshop first came out, people were like, I believe that this is true. I believe that this is real. And then we slowly over time saw people get course corrected on Facebook or other social media platforms where now they’re a little bit more likely to guess that it might be Photoshop. But they still get tricked with Photoshop, even a decade down the line, right?
With video, that allows us to create something much more believable, much more realistic. And the general public will take many years– in many cases, three years– to catch up to understand what is possible with technology. So they’re just now understanding a year or so in that voice cloning is possible. It’s going to take us a long time to be able to communicate to the general public, people that do not consider themselves techies, that it’s possible to create completely realistic manipulated video that looks, to the naked eye, completely real.
SOPHIE BUSHWICK: Even though it’s really hard to do, are there any giveaways that you can think of that people could look for to figure out whether a video is AI-generated?
RACHEL TOBAC: Well, it’s super hard to do. OpenAI has actually claimed that they have some level of watermarking. Now, I have a pretty good eye when it comes to spotting that watermark, and it’s still really hard for me to see in many cases. If you look in the bottom right-hand corner of videos generated by OpenAI, within their demo that they released, you’ll see little lines which then turn into the OpenAI logo after a few seconds.
That is a logo that they’re trying to say is a good watermark. But we have to remember that the general public is not going to be looking for a few tiny gray lines turning potentially into a logo that they may or may not recognize. That doesn’t necessarily mean that they can understand it.
A couple of other artifacts that you probably spot– I know I definitely spot them in the cat on the bed video– is we have arms that turn into blankets that turn back into arms. Things that just don’t 100% seem real. And then in the video of there is a woman walking down the streets of Tokyo, there’s some interesting artifacts in the background where we can see people kind of morph in and out of each other. It doesn’t seem 100% seamless. But that does not mean that it’s easy for people to spot, though.
SOPHIE BUSHWICK: Mhm. Well, what about some of the telltale signs that you can use to identify AI-generated still images? For instance, in a still image, often the hands will look a little funky. Or if there’s writing in the background, it won’t actually spell real words.
RACHEL TOBAC: [LAUGHS] Yeah.
SOPHIE BUSHWICK: Are any of those little hints also– do those also exist in these videos?
RACHEL TOBAC: I was looking for that, because we know in a lot of AI-generated pictures that the fingers don’t always look right. Maybe there’s an extra finger. Or the ears look a little off, like the earrings don’t match. Sometimes little fashion elements are easy to spot, like buttons don’t reflect the light correctly. But again, those are still kind of hard to spot with the naked eye. But there are little strange artifacts that are left behind.
In the Sora video that they released of the woman walking down the street, I did not see the same idiosyncrasies that I was expecting to see. So I’m not exactly sure if we’re going to see the strange fingers or the strange ears. It’s possible. But I think we have progressed past it, in many cases.
SOPHIE BUSHWICK: And I want to go back to something you mentioned, which is this idea of watermarks, these markings on AI-generated content, whether it’s images or videos or even text, that provides this telltale sign that what you’re looking at is artificial. And some of the bigger tech companies have been coordinating around using watermarks with their AI-generated content. So can you talk about watermarks in general, and whether you think this is an effective strategy?
RACHEL TOBAC: Yeah. So in the last few days, we saw 20 or so companies including OpenAI, Meta, TikTok, X, formerly known as Twitter, come together and pledge to collaborate to reduce the risk of AI-generated content impacting people and elections across the globe. This pledge did not include any specific actions that they would agree to.
So it’s a bit vague right now. But their collaborations could include working together to develop tools for detecting disinformation, using AI-generated images, video, audio, creating campaigns to educate users on the likelihood of encountering AI-generated misleading content and detecting AI-generated campaigns on their platforms, including with things like watermarking, or embedding metadata, labeling content as AI-generated.
So we’re seeing vague talk about these types of watermarks. But we have to remember without things like embedded metadata or a joint framework to actually remove the content that’s affecting elections, public health, et cetera, we’re just going to have something that could be cropped out to still trick people.
SOPHIE BUSHWICK: Right. And going beyond hacks and scams, I know that a lot of people worry about AI taking their jobs. So in the case of Sora, in particular, what kinds of jobs or services do you think it’s going to have an impact on?
RACHEL TOBAC: Well, we’re definitely going to see a major impact in photography, videography. I know stock photography and videographers saw Sora and were like, oh, goodness gracious, this is going to impact our job significantly. Anybody who’s in video production, training, we are going to see some major impacts around anywhere that photography or videography touches. And that’s everywhere.
So I think we’re going to see a lot of AI-generated content on things like YouTube or on other social media platforms. It’s going to be pretty much everywhere.
SOPHIE BUSHWICK: And we’ve said some negative things about this technology, some problems that it might present. But there’s also a bright side. So can you talk a little bit about if there’s benefits to making AI video like this widely accessible?
RACHEL TOBAC: Sure. I could see AI-generated videos helping educators in a classroom, for instance. I used to be a teacher back in the day. So for hard to teach concepts, I could imagine generating very quickly a brilliant, beautiful, eye-catching and exciting video about a topic that I’m trying to teach.
Or let’s say we’re trying to support an underserved population who doesn’t have the resources for their creativity. They might be able to unleash their creativity while using a tool that doesn’t cost them very much. That is pretty much the extent of it, though. I could see a lot more challenges–
[LAUGHTER]
–and harm impacting general society. And, of course, there will be positive use cases. But it’s my job as an ethical hacker to find the risk.
SOPHIE BUSHWICK: And we often end up talking about legislation to rein in AI. But a lot of this hasn’t, so far, come to fruition. So do you think that the tone of those conversations might change now that we and lawmakers know what kinds of videos this tech can produce?
RACHEL TOBAC: Yes. Because we’re seeing so much buzz in both the positive and negative direction in relation to AI-generated video, audio, voice cloning, photos, my guess is we’re going to see this backed up with legislation over time. But we have to remember that adversaries break the law frequently, especially laws related to seemingly anonymous behavior on the internet.
Just because there’s a law about how you can or can’t use an AI tool, it doesn’t mean that it will be easy to enforce in the world wide web. It’s a little different than a police interaction face to face, right? These people think that they can hide behind the anonymity of the internet to do whatever they want.
SOPHIE BUSHWICK: And I’ve got a question that goes back to the Will Smith spaghetti video.
RACHEL TOBAC: Sure.
SOPHIE BUSHWICK: So you could imagine Sora making a much better version of that video, except for the fact that OpenAI has said that its technology is not going to be used to make likenesses of politicians or celebrities. But you’ve also talked about how it’s easy to get around some of these restrictions. So what’s a good way that hackers could attempt to get around this restriction on likenesses of famous people?
RACHEL TOBAC: I’m sure we’re going to see people attempt to circumvent the celebrity likeness rule. My question for OpenAI is, what do they mean by celebrity? So it might be harder to impersonate, say, Joe Biden or Will Smith. And that’s great. I really want to make it very challenging to be able to do something like that.
[LAUGHTER]
But what about TikTok stars? What about stars from reality TV shows? What about people who are C or D-list celebrities? Are those people also in the database? We really don’t know exactly what they mean by, quote unquote, “celebrity.” I am guessing, at this point, they’re not going to be able to put a database together of hundreds of thousands, if not millions, of people, which I’m not sure how they’re going to protect folks like TikTok stars or Instagram stars or models.
SOPHIE BUSHWICK: And what about the larger AI landscape? Is there any particular technology that you think presents a big security risk or a big boon to hackers that more people should be aware of?
RACHEL TOBAC: Oh, absolutely. The use of AI technology, it makes it so much easier for us to hack. As a hacker, I can clone somebody’s voice in a few steps with a few clicks. It takes me less than 30 seconds. I can have a dynamic conversation with somebody, tricking them to send or wire money to somebody. I’ve done this in a 60 Minutes demo, where I hack somebody named Elizabeth to get to her boss, effectively stealing the passport number because Elizabeth thinks she’s talking to her boss.
SOPHIE BUSHWICK: Oh, wow.
RACHEL TOBAC: It’s scary, right? And when we have something like Sora, a tool that could potentially be used in conjunction with our AI voice cloning tools, well, now we could potentially create a video of someone who looks pretty similar to a VP at your company, and potentially use that to wire money or steal access, gain access to your passwords, et cetera.
We’ve even seen large companies get hit with these types of scams where the attacker is pretending to be a CFO in a finance team in a live dynamic call, which we haven’t seen be super believable yet, but apparently it cost a Hong Kong company $25.6 million when they ended up falling for that deepfake technology in a live dynamic call. So we’re going to see some major impacts to the way that we do business and the way that we authenticate people are who they say they are.
SOPHIE BUSHWICK: Yes, I am definitely going to doubt everything I see on a video call from now on out. So thanks for that. [LAUGHS]
RACHEL TOBAC: You’re welcome.
SOPHIE BUSHWICK: And thank you for taking the time to talk about this, Rachel.
RACHEL TOBAC: Of course. Thanks for having me.
SOPHIE BUSHWICK: Rachel Tobac is an ethical hacker and CEO of SocialProof Security.
Copyright © 2023 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/
D. Peterschmidt is a producer, host of the podcast Universe of Art, and composes music for Science Friday’s podcasts. Their D&D character is a clumsy bard named Chip Chap Chopman.
Sophie Bushwick is senior news editor at New Scientist in New York, New York. Previously, she was a senior editor at Popular Science and technology editor at Scientific American.