June 2024 | This Month in Generative AI: Moving Through the Uncanny Valley (Pt. 1 of 2)
Adobe Firefly

June 2024 | This Month in Generative AI: Moving Through the Uncanny Valley (Pt. 1 of 2)

By Hany Farid, UC Berkeley Professor, CAI Advisor

News and trends shaping our understanding of generative AI technology and its applications.

First coined by Japanese roboticist Masahiro Mori in the 1970s, the term uncanny valley describes a phenomenon that occurs when a humanoid robot, or an image or video of a computer-generated human, becomes more human-like. There is a point at which the humanoid depiction becomes eerily similar to humans but is still distinguishable from real humans, causing a significant drop in our emotional comfort and acceptance.

This transition is known as the uncanny valley. A humanoid depiction is said to exit the uncanny valley when it becomes so realistic that it is indistinguishable from a real person.

I have previously discussed the results of an earlier study that found that even in 2022, GAN-generated faces had passed through the uncanny valley. In particular, 315 paid online participants were shown — one at a time — 128 faces, half of which were real. The participants were asked to classify each as either real or synthetic. The average accuracy on this task was 48.2%, close to chance performance of 50%, with participants equally as likely to say that a real face was synthetic as vice versa.

Unlike GANs, today's generative AI tools (e.g., Adobe Firefly, Midjourney, and DALL-E) deploy a generative approach known as text-to-image or diffusion-based synthesis.

Trained on billions of images with an accompanying descriptive caption, a diffusion model progressively corrupts each training image until only visual noise remains. The model then learns to denoise each image by reversing this corruption. This model can then be conditioned to generate an image that is semantically consistent with a text prompt such as “a professional photo of a middle-aged executive.” 

Unlike GANs that can only generate images of a single category like faces, cats, or landscapes, diffusion models afford much more rendering control and are limited only by the imagination of the prompter.

In collaboration with Professor Sophie Nightingale and Lex McGuire of Lancaster University, we have launched a new perceptual study to determine whether diffusion-based faces, like GAN images, have passed through the uncanny valley.

In this study, participants are shown, one at a time, a GAN-generated, diffusion-generated, or real face. Participants are asked to classify each face as either real or synthetic — they are not explicitly told about the difference between GAN-generated and diffusion-generated faces. 

Although we are still collecting data, we have completed a pilot study from 20 participants. The average accuracy for all three categories is 62%, only slightly better than chance performance of 50%. We also find that there is little difference in accuracy between GAN- and diffusion-generated images. You can test yourself on a set of 24 images.

Since the focus of this series is on generative AI, I should confess that I used ChatGPT to write the HTML and JavaScript code for the "AI or not" quiz. 

Interestingly, one of our participants performed much better than the rest. With an accuracy across all three image classes of 85%, they significantly outperformed other participants. It could be they got lucky, or it could be that they have a particularly well-trained eye. 

In the world of facial recognition, there are so-called super recognizers who have a seeming superpower to recognize people even after only seeing a single photo of them. These super-recognizers have been employed by law enforcement agencies to detect fugitives from CCTV footage. We plan to investigate whether similar super-recognizers exist for detecting AI-generated faces in images. Get in touch with me (hfarid@berkeley.edu) if you find yourself performing particularly well on the quiz linked above.

These preliminary results suggest that, like GAN-generated faces, diffusion-generated faces have passed through or are rapidly passing through the uncanny valley. This does not mean that all diffusion-generated images are perceptually indistinguishable from real images, because in our study participants only view a single face from the neck up. Generally speaking, the more complex the depicted scene, the more likely it is that the image will contain some structural or gravity-defying artifact that a keen eye will detect. If, however, generative AI continues along its current trajectory, it seems likely that sooner or later it is going to be very difficult to perceptually distinguish the real from the fake.

We will soon be launching a similar perceptual study to examine our ability to distinguish between real and AI-generated voices. If my performance on the pilot study is any indication, I predict that AI-generated voices have already passed through the uncanny valley. At the same time, I think that AI-generated videos and face-swap and lip-sync deepfake videos are still on the other side of the uncanny valley, but I don't expect that to be the case for very long.

So what can be done?

As AI-generated content becomes indistinguishable from “real” content, the work of the Content Authenticity Initiative (CAI), where I’m an advisor, becomes increasingly important. Using the underlying C2PA open technical standard, the CAI seeks to accelerate adoption of provenance labeling, whereby viewers can quickly inspect digital assets using Content Credentials

With over 3,000 members, the CAI is working with generative AI companies, hardware and software providers, news and social media companies, and many others to establish Content Credentials as a digital industry standard.

Michael DUFOUR

Tech Explorer - Generative AI, LLMs, AI Design and workflows, 2D/3D AR and VR experience, app design

1mo

Hello Hany Farid, I sent you an email as suggested in this article :) I had 85% of global score in the survey, and 8/8 on the AIorNotQuiz and would be very happy if we could connect 😊

Gerald Rusche

Mehr Vertrauen in eure Marke und Leistungsfähigkeit … mit gruvie Videos 🎥

1mo

To fake still images is not a big deal for AI anymore. But the real truth lies within the emergence and degradation of a smile. I think this will take some extra time to manage that. Authenticity just for one moment is not enough. It has to be honest and trustworthy to build trust. I just just signed in to a seminar about micro expressions from Paul Ekman. Just to deliver more trusted videos for my clients like Apple, Pixar or Disney have done.

Like
Reply
Aamir Kadri

Startup Growth | Generative AI | Process | CS/X | Strategy 🏴☠️

1mo

The study was very interesting, and it seems to fool a lot of people, and the fact is that AI models keep getting better. Newer genAI models will no doubt be way past the uncanny valley.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics