Lies, Fakes, and Deep Fakes

On May 22nd, 2019, House Speaker Nancy Pelosi was addressing a conference at the Center for American Progress — a liberal think tank — when she laid out her position on a potential Trump impeachment. Not a highly articulate speaker at the best of times, Pelosi stumbled over a few syllables in her presentation and halted between a few words in her speech.

Following the speech three things happened: first, a video was almost immediately made and posted online; a video in which Pelosi’s delivery was slowed down and her tone changed to give the impression that she was either drunk or affected by a health disorder. Second, the next day President Trump tweeted a separate video of Pelosi’s speech, making fun of her. And third, Fox Business broadcast a faked video which spliced together moments from her news conference that emphasized the stumbles in her speech.

Never one to miss a chance to savage an opponent, the President’s lawyer, Rudy Giuliani, retweeted the doctored video, asking: “What is wrong with Nancy Pelosi? Her speech pattern is bizarre.” Giuliani eventually deleted the tweet, but not before it made the rounds of right-wing websites. Neither the President nor his lawyer apologized for their collusion in spreading a false image of Speaker Pelosi.

This sordid story went well beyond the kind of “fake news” to which we have become accustomed in the era of Donald Trump. This version was unique in that it combined both a “deception” and a “scam.” The former term, deception, I define as the manipulation of visual images. The latter, scam, is the diffusion and amplification of the manipulated image. Deception emerges from the dark side of the web — from state actors and movement activists who are difficult to identify (and, if they are, may be protected by the First Amendment). Scamming, however, mostly depends on open-access platforms that claim to have no way of weeding out manipulated videos — as Facebook originally claimed when the Cambridge Analytica scandal broke.

But the Pelosi deception only manipulated the Speaker’s image; it did not have her saying anything that she did not actually say. That is not the case for a different, much more frightening phenomenon, one that presents a growing danger to truth, reliability, and democracy: the use of artificial intelligence to manipulate physical images — deep fakes.

***

The term “deep fakes” is a portmanteau of “deep learning,” the computerized creation of neural nets with many layers, and “fake news,” the use of a manufactured reality to attack, demean, or disguise real people or events. While such deceptions are at least annoying (and at most devastating), the real danger is that, through their diffusion and amplification, deep fakes feed into the psychological tendency human beings have to attend to negative images and thereby erode belief in the reliability of public discourse. As Tiffany Li wrote in commenting on the circulation of the faked Pelosi video: “We risk a future where no one can really know what is real — a threat to the foundation of global democracy.”

The law, artificial intelligence theory, and social psychology will need to combine their insights in order to understand and combat the spread of deep fake technology. My interest in this phenomenon grows out of my work on the language and diffusion of social movement messages. For although state actors are the most visible users of fake news, movements were among the first to grasp the implications of digital communication for weak actors like themselves. By using artificial intelligence, the mediatized movement repertoire has moved to a new technological level.

I begin with a definition: in the words of legal scholars Bobby Chesney and Danielle Citron, “Deepfake technology leverages machine learning algorithms to insert faces and voices into video and audio recordings of actual people and enables the creation of realistic impersonations out of digital whole-cloth.” The result is a realistic-looking video or audio making it appear as if someone said or did something they never said or did.

For example, in their article Chesney and Citron presented two versions of a video of anti-gun activist Emma González, a doctored one and a real one. In the real image (of which you can see a gif here), Gonzalez is dramatizing her opposition to gun culture by tearing up a copy of a bullseye target. In the faked image (as debunked here) the bullseye target has been replaced with the U.S. constitution.

Although the most dangerous area for deep fake technology is politics, its original application was in pornography. There the technique consisted in using thousands of images to attach the head of a celebrity to the body of a nude person to make it appear that the celebrity had been filmed engaging in sexual acts. “Frighteningly,” writes legal scholar Rebecca Delfino, “now anyone who has appeared in a digital image may star in pornography against their will, and currently, the law provides no clear or direct recourse to stop it.”

***

So rapidly has the deep fake phenomenon spread that the Washington Post has started a “fact-checker” interactive website with a classification of six forms of fake news. Each of these six require an increasing degree of technological proficiency to accomplish. Reviewing these forms and providing an example of each can be helpful in understanding the consequences of deep fakes for our political discourse. I will take them in the order in which the Post lists them.

1. Misrepresentation: presenting an unaltered photo or video in an inaccurate manner, one that misrepresents the footage and misleads the viewer.

Here is a typical example: in October of 2017, Florida Representative Matt Gaetz tweeted a video of unidentified men handing out money to women on a street. The street was in Honduras, Gaetz asserted, and the cash was given so that these men and women could join the infamous “caravan” of migrants heading toward the American border. But this is not, in fact, where the picture was from or what was happening. The scene actually came from Guatemala and there is as yet no evidence regarding either what the handouts were for or where the money had come from. The day after the Gaetz tweet went up, President Trump re-tweeted the same video. And hours later, at a rally in Montana, Trump floated the theory that the caravan was financed by his political opponents. A few days later, the Open Society Foundation felt it had to issue a statement debunking Gaetz’s Twitter speculation. But by that time, there had been 2.2 million views of Gaetz’s tweet.

2. Isolation: extracting a brief clip from a longer video to create a false narrative that does not reflect the event as it occurred.

Here Senator Kamala Harris’s condemnation of Justice Brett Kavanaugh can serve as example. During his congressional hearings, Senator Harris criticized him for what she called “a dog whistle for going after birth control.” But the words she was criticizing were not, in fact, Justice Kavanaugh’s, but a statement issued by another group’s which he had quoted. Justice Kavanaugh could not have been dog-whistling because he was citing a case brought by an antiabortion religious group challenging Obamacare rules on providing employees health coverage for contraception. His words were isolated to create a false narrative.

3. Omission: editing out large portions from a video and presenting it as if it was a complete narrative, despite missing key elements.

On July 25, 2019, Congresswoman Ilhan Omar made a statement about white nationalist terrorism that was manipulated by simply omitting part of what she said. Here is the full quote: “I would say our country should be more fearful of white men across our country because they are actually causing most of the deaths within this country, and so if fear was the driving force of policies to keep America safe, Americans safe inside of this country, we should be profiling, monitoring and creating policies to fight the radicalization of white men.”
But here is the doctored version which leaves out the critical center section of the quote: “I would say our country should be more fearful of white men across our country because they are actually causing most of the deaths within this country. We should be profiling, monitoring and creating policies to fight the radicalization of white men.” The change in meaning is clear. A simple omission can be as devastating to our public discourse as the commission of deep fakes — provided the editing is carefully done.

4. Splicing: editing together disparate videos to fundamentally alter the story that is being told.

When CNN host Chris Cuomo attacked a man who called him “Fredo” — which he interpreted as an ethnic slur against Italian-Americans — the New York Post superimposed the heads of Mario, Andrew, and Chris Cuomo onto the bodies of the infamous Corleone family from the film “The Godfather.” In the image fabricated by the Post the technology used was simple — it was a version of the photoshopping technology that many of our grandkids can do with cellphone photos — but it contributed to an ethnic slur while pretending to merely be making fun of Cuomo’s outburst.

5. Doctoring: touching up or distorting a real video so that it looks as if the speaker is saying or doing something different than what they were.

In April 2017, when Oregon police were searching for the bank robber dubbed the “Foul Mouth Bandit,” they digitally altered the suspect’s mugshot to remove his tattoos before showing it to witnesses. In the photo comparison published by the Washington Post (which can be seen here), the differences are striking. In reflecting on the effects of such doctoring the head of the Oregon ACLU, which is appealing the case, warned that “the revelations [of doctoring the photo] raise big questions about how many people may have been falsely identified by eyewitnesses in recent years based on [such] changes.”

6. Fabrication: using artificial intelligence to create high-quality fake media whether in the form of images, video, or audio.

This is the “Full Monty” of fake news technology. For example, in early 2019, two artists and an advertising agency fabricated a video of Mark Zuckerberg to demonstrate the dangers of deep fake technology. In the made-up video, the founder of Facebook describes himself as “one man, with total control over billions of people’s stolen data.”

As the sequential progression of these examples shows, each form of disinformation is more dangerous than the one preceding. The first two forms didn’t change the media itself; instead, they excluded key contexts in order to give a different meaning to the message. The third and fourth forms, however, do manipulate existing videos and in so doing give them a deceptive meaning. And in the fifth and sixth forms part or all of the image was manipulated, or — in the case of fabrication — created out of whole cloth.

***

What are the dangers in these deceptive manipulations of physical images? In the field of national security, history can be influenced by the production and diffusion of faked information. In an ingenious experiment, Sarah Kreps and Miles McCain, using simple AI software, invented a passage about North Korea to show how dangerous artificially-produced language could be. In an effort to show whether synthetic information can generate convincing news stories about complex foreign policy issues, they produced the following paragraph:

North Korean industry is critical to Pyongyang’s economy as international sanctions have already put a chill on its interaction with foreign investors who are traded in the market. Liberty Global Customs, which occasionally ships cargo to North Korea, stopped trading operations earlier this year because of pressure from the Justice Department, according to Rep. Ted Lieu (D-Calif.), chairman of the Congressional Foreign Trade Committee.

“This paragraph,” write Kreps and McCain, “has no basis in reality. It is complete and utter garbage, intended not to be correct but to sound correct.” Had it appeared in the press, both the firm and the Congressman would have been deeply embarrassed.

A second danger is what I call the “negative technological fix.” I think we have been giving much greater attention to the threat of the “deception” – the technical wizardry that permits the production of deep fake videos. Following a Congressional hearing chaired by congressman Adam Schiff, the head of the House Intelligence Committee, Facebook and Microsoft, working with a group of academic experts, announced a plan to investigate ways of identifying and countering deep fake technology. At the same time, NYU published a report calling for a similar program.

These efforts are well-timed, but they mainly focus on the production of deep fakes; the bigger danger is not the deception but the “scam” — its amplification on social media. As Danielle Citron pointed out in her important study of cyber harassment, it is the character of the Internet itself that fuels the vices of the deceivers by broadcasting and amplifying their cruel messages.

Part of this judgment on my part comes from the recognition that trying to stop the production of deep fakes — the deception — is unlikely to be successful. There are two reasons for this. The first is that the production of deep fakes is getting progressively easier. As technology democratizes further the capacity to make such media only grows. The second reason is that, even if the maker of the deceptive video can be identified in real time, current laws are unlikely to inhibit production of deep fakes. As Professor Richard Hasen writes, “any law that would purport to regulate the content of media being used for political communications would be subject to heightened First Amendment scrutiny.”

This takes us to the greatest danger of all: the potential impact of deep fake technology on democracy. This would be a serious problem at any time but, as Hasen points out, it is particularly dangerous in this “moment of polarization.” He writes: “We are experiencing rapid technological change in which social media amplifies and reinforces existing ideas, and where people get exposed to information from increasingly siloed sources.” We saw the extent of the damage that Russian interference in the 2016 election did to American politics using a relatively modest level of technology. Think of what they could have accomplished using the full panoply of deep fake artificial intelligence. Sad as it is to say, we may yet see the results of such broader efforts in the upcoming 2020 presidential election — especially given the presence of a president who has only a nodding acquaintance with the truth and has shown no inclination to go after Russian interference. In such a situation, Congress, the intelligence agencies, and the traditional media must work hard to educate the public about the dangers of believing everything that appears on the Internet.

But the deeper danger is not the immediate deception produced by deep fake images; it is the effect it has on “real” news. As Chesney and Citron point out, “as the capacity to produce deep fakes spreads, journalists increasingly will encounter a dilemma: when someone provides video or audio evidence of a newsworthy even, can its authenticity be trusted?” This leads to what Chesney and Citron call “the liar’s dividend” — the advantage gained by deceivers from a climate of “truthiness,” “post-truth,” and general distrust of the media. “That distrust,” they write, “has been stoked relentlessly by President Trump and like-minded sources in television and radio; the mantra ‘fake news’ has thereby become an instantly recognized shorthand for a host of propositions about the supposed corruption and bias of a wide array of journalists, and a useful substitute for argument when confronted with damaging factual assertions.”

What can be done? We are not ready for a protracted technological battle against largely unseen antagonists who are determined to undermine the fabric of democracy. And governmental efforts to close down hate-inspired deep fakes have already been stymied by First Amendment and other legitimate concerns. So if production of these deep fakes is unlikely to be curtailed, why ought instead to attack the distribution and amplification of these deceptions by putting political and moral pressure on the intermediaries that broadcast them: the Internet platforms. Unlike those who produce deep fakes — who are mainly invisible — these platforms are highly visible, depend on the public’s willingness to continue to use them, and have proven susceptible to political and social pressure.

We may be forgiven for being skeptical that either Congress or the Executive will show great energy in insisting on responsible curating from these deep-pocketed firms — especially as we approach the most expensive election campaign in American history. But where these branches of government have withdrawn, civil society must engage. As ACLU legal director David Cole concluded based on his analysis of the campaigns for marriage equality and for gun rights, these campaigns were “as much about molding public sentiment as shaping laws, as much about working outside the courts as pressing a case within them.” If it is social movement activists working undercover who produce the most notorious deep fakes, only a countermovement operating in the public sphere can defeat them.

Sidney Tarrow is Emeritus Maxwell Upson Professor of Government and Adjunct Professor, Cornell Law School.