“A people that no longer can believe anything, cannot make up its own mind. It is deprived not only of its capacity to act but also of its capacity to think and to judge. And with such a people, you can then do what you please.”
This quote from American historian and philosopher Hannah Ardent was brought up in a recent Ted Talk by one of the world’s strongest voices on the world of artificial intelligence.
The context is that distinguishing between AI-generated content and authentic human-generated material is becoming increasingly challenging. With the progression of generative AI and other advancements in deep fakery, creating convincing imitations of voices and faces takes only a few seconds of recorded audio, or a couple of images.
Sam Gregory is a leading expert in technology and human rights advocacy, known for his perspectives on the rising threat of hyper-realistic deepfakes and his work on addressing the challenges posed by these deceptive technologies. Gregory’s impact extends to the frontline experiences of journalists and human rights defenders through his involvement in the global ‘Prepare, Don’t Panic’ initiative on deepfakes and generative AI. The Journal followed a speech that Gregory gave to Ted Talks: a series of short presentations delivered by speakers from various fields, including science, technology, education, design, and entertainment.
Gregory believes that deceptive and malicious audio-visual AI may not be the root cause of our societal problems, but it is poised to contribute to them significantly. Audio Clones, for instance, are spreading across various electoral contexts, prompting people to digest any form of content with a constant “Is it? Isn’t it?” dilemma. These uncertainties have the power to obscure human rights evidence in war zones and accounts on sexual trauma, because people no longer simply believe what they are presented with as evidence. They cannot be blamed either, in a world where synthetic avatars are impersonating news.
During his speech, Sam Gregory drew on some examples that his Rapid Response Task Force has experienced. This task force is dedicated to countering deep fakes and is made up of media forensics experts and companies donating their time and expertise. Recently, they were presented with three audio clips from Sudan, West Africa, and India, all subject to assertions that they were deep fakes rather than authentic recordings. In the case of Sudan, experts employed a machine-learning algorithm, trained on over a million instances of synthetic speech, offering a near-definitive conclusion that the audio was genuine. However, the challenges were heightened when dealing with the third clip, that had leaked audio involving a politician from India. Despite the politician’s adamant claims of AI manipulation, experts spent nearly an hour analysing samples to create a personalised model of the politician’s authentic voice. The conclusion was that, at the very least, a portion of the audio was real and not AI-generated. These examples highlight the complexity of rapidly and conclusively differentiating between truth and false claims. The ease of labelling something as a deep fake when it is genuine continues to rise.
The future presents enormous challenges in both safeguarding authenticity and detecting deception. Politicians, major leaders in the EU, Turkey, Mexico, and US mayoral candidates have already become targets of audio and video deep fakes. Political advertisements now incorporate footage of events that never occurred, and individuals are disseminating AI-generated imagery from crisis zones, falsely asserting its authenticity. This predicament is not entirely new. Videos and images taken from one context, time, or place and presented as if they belong to another. This contributes to confusion and the spread of disinformation. In a world marked by partisanship, the risk of not knowing where to resort for shared, trustworthy information is real, and it undermines democracies. In a scenario where AI is employed convincingly and we desire and simultaneously reject inconvenient truths, the risk of surrendering to a future of manipulation is evident.
Let’s be prepared
To avert this impending reality, proactive measures are essential, and Gregory believes that by preparing rather than succumbing to panic, we can navigate through these challenges. He further emphasises that surrendering to uncertainty won’t serve us well, as it may lead us into the hands of governments, corporations, and individuals who exploit our fears, fostering a cloud of confusion with AI as their convenient excuse. Even for those confident in their ability to identify a deep fake, it’s crucial to acknowledge that well-known tips are fast becoming outdated. Deep fakes, once marked by subtle imperfections, have evolved to blink and feature lifelike details, erasing the visible and audible cues we rely on to distinguish reality from fabrication. A crucial point that Sam Gregory mentions is that the responsibility to make such determinations shouldn’t rest solely on individuals. Rather, comprehensive solutions are imperative, and people require robust foundations and tools capable of differentiating between the authentic and the simulated.
The first crucial step towards addressing the challenges posed by deep fakes is to ensure that detection skills and tools are accessible to those who need them most. Gregory explains that through extensive conversations with journalists, community leaders, and human rights defenders, it has become evident that they face the same dilemmas as the public. They too find themselves scrutinising audio content, examining images, and often resorting to online detectors. However, the detectors they use leave them uncertain about the reliability of the results—whether they are dealing with a false positive, a false negative, or a dependable outcome at all.
There are many challenges in deep fake detection. Many detection tools are specialised and effective only in addressing a specific method of creating deep fakes, necessitating the use of multiple tools. Additionally, these tools struggle with low-quality content found on social media platforms. A low confidence score further complicates matters, leaving users unsure of the reliability of the detection, especially concerning the underlying technology and its applicability to the manipulation at hand.
It’s crucial to recognise that tools designed to identify AI manipulation may not detect manually edited content, limiting their accuracy. Another issue is striking a balance between security and access. Making these tools universally available risks rendering them ineffective, since they will be examined in detail and eventually overcome by those designing new deception techniques. Nonetheless, Sam Gregory believes that it remains essential to make these tools accessible to journalists, community leaders, and election officials worldwide— the ones that he refers to as our primary defense against deep fakes.
In the second crucial step, we must recognise that AI will pervade all aspects of our communication, participating in the creation, alteration, and editing of content. It won’t be a simple binary of “yes, it’s AI” or “no, it’s not AI”. Instead, AI will be an integral part of our communication landscape. To navigate this, we need to comprehend the composition of what we consume, a concept often referred to as content provenance and disclosure.
Experts in technology have been developing methods to incorporate invisible watermarking into AI-generated media and embedding cryptographically signed metadata into files. This data provides details about the content and serves as record of how AI contributed to its creation or editing. Essentially, it acts as a recipe and serving instructions for the blend of AI and human elements present when you perceive something visually and audibly. Sam Gregory insists that this component is pivotal for fostering a new era of AI-infused media literacy.
This transition is already observable in our communication patterns. Platforms like TikTok feature videos with diverse elements such as audio sources, AI filters, green screens, backgrounds, and edits that stitch seamlessly. This can be considered an initial step towards transparency in major platforms, although it is not yet universal across the internet. The existing framework lacks reliability, updatability, and security.
Challenges that persist
Considerable challenges persist. For example, citizen journalists filming in restrictive contexts or satirical creators using innovative AI tools must be protected. Some would argue that such individuals, after revealing their identity or personally identifiable information to use their camera or Chat GPT, should be able to maintain anonymity. This emphasises the importance of focusing on the “how” of AI-human media-making rather than the “who”.
The final step that Gregory mentions highlights the need for a comprehensive pipeline of responsibility, extending from foundation models and open-source projects to their deployment in systems, Application Programming Interfaces (APIs), apps, and the platforms where we consume media and communicate. Essentially, governments need to play a pivotal role in ensuring transparency, accountability, and liability within the AI responsibility pipeline.
“I’ve spent much of the last 15 years fighting a rear-guard action, like so many of my colleagues in the human rights world, against the failures of social media. We can’t make those mistakes again in this next generation of technology,” he warns.
These three elements — detection tailored for those who need it most, rights-respecting provenance, and a meticulously managed pipeline of responsibility — are crucial. Without them, warns the expert, we risk being stuck in a futile loop, searching for tell-tell signs that something has been created artificially, such as six-fingered hands or unblinking eyes.
The absence of these foundational steps could lead us into a world described by Hannah Arendt in our introduction: a world where people no longer trust information, lose the capacity to think independently, and lack the ability to judge. In such a scenario, those in control can wield power without constraint.