Earley AI Podcast - Episode 6: Synthetic Media with Henrik de Gyor

Deepfakes, Haptics, and the Coming Crisis of Authenticity - What Every Organization Needs to Know About AI-Generated Media

Guest: Henrik de Gyor, Consultant, Podcaster, and Author of "Synthetic Media: The Next Reality"

Hosts: Seth Earley, CEO at Earley Information Science

Chris Featherstone, Sr. Director of AI/Data Product/Program Management at Salesforce

Published on: August 29, 2022

In this episode, Seth Earley and Chris Featherstone speak with Henrik de Gyor, a digital asset management expert, podcaster, and author whose career began as a news photographer managing hundreds of thousands of images that needed to be tagged and catalogued. That hands-on reckoning with metadata, rights management, and findability led Henrik to become a sought-after consultant and speaker - and eventually to write a book on synthetic media just as the technology was crossing the threshold from niche curiosity to mainstream disruption. He explains what synthetic media actually is, maps the spectrum from consensual avatar licensing to non-consensual deepfakes, introduces the Coalition for Content Provenance and Authenticity's emerging standards for tracking how media was created and manipulated, and looks ahead to haptic, taste, and smell dimensions that will make the question of what is "real" more complex than most people are ready for.

Key Takeaways:

Synthetic media is any media artificially generated by a machine - today it ranges from AI-assisted tools that type a script and produce a photorealistic avatar video in minutes, to text-to-image generators that produce hundreds of stock photo variations on demand, and these capabilities are already past the Turing test, meaning the glitches and artifacts that once gave them away have essentially vanished.
Synthetic media does not threaten to eliminate actors, models, or executives - it threatens to radically change their revenue model, enabling celebrities and public figures to license their likeness, voice, and body for unlimited personalized content without spending time in a studio, though the rights management infrastructure to support that model is not yet in place.
The most prevalent nefarious use of synthetic media today is non-consensual intimate imagery, but the deeper societal danger is politically motivated deepfakes - synthetic video of world leaders or officials can be believed and shared before correction reaches the same audience, and existing legal frameworks do not extend across borders in a way that addresses this.
Provenance is the only viable long-term solution to the authenticity problem - the Coalition for Content Provenance and Authenticity (C2PA, at c2pa.org) is developing standards to embed the full creation and editing history of a media file from its origin device forward, with consumer-facing provenance icons expected to appear on media within three to five years.
The top experts in synthetic media today have only been working in the field for one to two years - the technology is so new that any organization that begins exploring it now is genuinely on the ground floor, and the best way to understand it is to use it, not just read about it.
Synthetic media is already transforming commercial production - approximately 70 percent of catalog images in retail are now virtual rather than photo shoot-based, and organizations can synthesize product demos, executive presentations, and multilingual corporate communications at a fraction of the cost and time of traditional production.
The next frontier of synthetic media extends beyond sight and sound into haptics, taste, and smell - researchers have already produced working examples of electronically synthesized taste and smell sensations, and the convergence of these additional senses with AR and VR will create experiences of things that no longer exist or have never existed yet.

Insightful Quotes:

"Synthetic media is basically media that's artificially generated by a machine. Currently the end result is mostly defined by a human, but that line is blurring more and more toward the machine - where eventually it'll be completely created by machine because there's a need for it, or an ask for it." - Henrik de Gyor

"You can't declare intent and actually believe it, because people lie. Since people lie, you can't say this is only truthful information - there's no way to declare that officially in an authoritative way, regardless of what the source is, because sources can be hacked. So you figure out the provenance: how it was done, by whom, on what device, and what happened to it after." - Henrik de Gyor

"The technology is neutral - think about it. It's the intent of how you decide to use it that matters. We can't declare intent and we can't figure out whether it's trustworthy or not, but do we want to be entertained with it, do we want to experience it? That's a choice. And the more positive content you create with it, the more you drown out the negative." - Henrik de Gyor

Tune in to hear Henrik de Gyor explain why rights management professionals are simply not ready for the synthetic media licensing models that are coming, how the deep Tom Cruise TikTok account was built by scraping every angle of his face from every film he ever appeared in, why NFTs are not actually decentralized in the way most people assume, what it means that a Michelin chef commissioned a synthetic taste experience that lets people experience his food through a probe and a phone app, and why our current imagination of what synthetic media can do is the equivalent of thinking the best use of the early Internet was putting the phone book online.

Links

Book: Synthetic Media: The Next Reality
https://www.amazon.com/gp/product/B09MJW7BX1/

Podcast: Synthetic Media
https://open.spotify.com/show/5N7Qnx1qI1QOo6Q6T5jziJ

Synthetic Futures

Contact Henrik:

https://www.linkedin.com/in/hdegyor/

Thanks to our sponsors:

Podcast Transcript: Synthetic Media - Deepfakes, Provenance, and the Limits of What We Can Imagine

Transcript introduction

This transcript captures a conversation between Seth Earley, Chris Featherstone, and Henrik de Gyor about synthetic media - what it is, how it is being used for good and ill today, how provenance standards are being developed to address the authenticity crisis, and where the technology is heading as it expands beyond sight and sound into the remaining human senses.

Transcript

Seth Earley: Good morning, good afternoon, good evening, depending upon your time zone - welcome to today's podcast. My name is Seth Earley.

Chris Featherstone: And as always I'm Chris Featherstone. Good to be with you.

Seth Earley: Before I continue, I do want to thank our sponsors - CMSWire and Simpler Media, Earley Information Science, and the Marketing AI Institute. Our guest today is a man of many interests. He started his career as a news photographer and found himself with hundreds of thousands of photos that needed to be tagged and catalogued. That brought him to the need to become an expert in digital asset management and areas related to that. His ability to crack the code on those topics and make them comprehensible has made him a sought-after speaker and consultant. He's also an author and a podcaster. His latest book is "Synthetic Media: The Next Reality," slated for release next month. Please welcome Henrik de Gyor.

Henrik de Gyor: Hi everyone, thanks for having me.

Seth Earley: I was close! Henrik, great to see you. We've known each other for many, many years going back to the Henry Stewart events. So maybe you could tell us a little bit about how you got to where you are.

Henrik de Gyor: Sure. Basically I work on niche topics - digital asset management is one of them, otherwise known as DAM. I try to look into other things that are related to those, whether it's metadata, whether it's rights management. One of the most recent topics I've looked at is how media is created, and how it will be created going forward - and it won't be just by us humans. It'll be by machines, by AIs, through deep learning and a variety of other things. It will also be different kinds of media that we're not used to - not just things we hear and see. That brings a whole set of different questions and possibilities, good and bad, some of which we're already seeing. Like many technologies, they're used for the nefarious first, and then eventually people see the good out of the possibilities of what you can do with it, and that's how they thrive.

Someone asked me to write a book about blockchain a few years ago, so I did, and I looked into that very deeply - those were digital assets as well. Then, interestingly enough, there are alignments there because people are using blockchain with synthetic media. There's a large mixture of these layers and facets. It's really fascinating and there are a lot of dots to connect. I wanted to make sure I connect those dots and simplify things that are not understood, because for the most part, synthetic media is not understood. Deepfakes are a part of it and it's certainly leading to this whole idea of synthetic media.

Seth Earley: Maybe you can give us a definition of what it is and the different flavors, and then some of the implications. I've seen some demos - I think it was a Key and Peele one at an Adobe conference where they changed the transcription and were able to synthesize both the voice and vocal intonation, and it sounded exactly like him. Mind boggling.

Henrik de Gyor: Yeah, and that can be done within a few minutes nowadays with the right training models and things like that. What used to take weeks or months or hours is now down to minutes, depending on what you're asking for. The more custom it is, the more time it takes to build.

Chris Featherstone: Henrik, I'd love to get a definition from you, because we've got a diverse audience. We've got folks who need to focus on AI-related topics germane to media and content today, and then we've got all of this emerging stuff hitting us that we have to figure out how to blend. You know, this text-to-speech which is now using neural nets - I can take Seth with a training model in a few minutes to a few hours and now use his voice in any context I need to. I'd love to get your take on where the asset management piece is and then definitely want to jump into all the other pieces, because you're right - the nefarious topics are super interesting, scary as hell, and yet we need to understand them in order to pull out the good.

Henrik de Gyor: Absolutely. One of my interests was: all these files are going to be created, this media is going to be made - how are we going to store and find it again? Which goes back to what you guys do at Earley. How are we going to categorize it all? But more importantly, how was it created in the first place?

Right now, most of it is generated with the assistance of AI, whether it's games, autocoder tools, or a variety of tools made by well-known companies like NVIDIA, Samsung, Adobe, and many others - and a lot of startups popping up throughout the world that realize there's a need to create this stuff and simplify and democratize it. There are super specialists creating deep fakes, and there are tools where you can literally type in a script and an avatar will say that script perfectly in whatever language you want - up to 50 or 60 languages - translated, in a few minutes, with no video camera, no video shoots, no sitting in a studio, none of that.

So what synthetic media is - let me define it. Synthetic media is basically media that's artificially generated by a machine. Currently the end result is mostly defined by a human, but that line is blurring more and more toward the machine, where eventually it'll be completely created by machine because there's a need for it.

There's image - text to image, where you literally type in a few words and it creates the image you describe in 100 different varieties, displacing stock photography. Same thing with video. You can type in a description of what you want to see, edit it, refine it, and it'll continue to generate. Infinite possibilities.

Seth Earley: Is this the future of actors? Are their careers going to be in jeopardy?

Henrik de Gyor: No - it's actually going to scale, if you think about it. If you want to pay for a personalized birthday message from your favorite celebrity, that's very easy to do and they don't have to spend a minute of time in a studio.

Chris Featherstone: So it's basically like me franchising my voice and my face. If I'm Samuel L. Jackson, instead of licensing my voice for a voiceover, I license my voice and my person - my IP - and instead of showing up to do anything, I just stay home and collect royalties because I've licensed my stuff out. Whether it's a song or a speech or a presentation.

Henrik de Gyor: Yes - whether it's a song or a speech. And people will be creating things with their own avatars. You'll probably have multiple - one that looks like the realistic you at this point in time, and then an idealized one to use in a game or another environment. The licensing model will have to evolve to accommodate any reality - augmented reality, virtual reality, the metaverse. Blockchain will likely be part of that, along with rights management, authentication, and identity management.

And I guarantee you that the people I spoke to in the rights management field are not ready for this - that's the short story. The short story is they'll have to evolve their model to license in any reality.

Seth Earley: Rights management, authentication, identity management, blockchain - the metadata structures for retrieval and versioning and applications. And of course there's so much hype around the metaverse.

Henrik de Gyor: And we'll get more climate-ized to that in the coming years as headwear becomes more popular, because well-known manufacturers that make our phones - and social media networks that are changing their names - are creating different mediums and hardware for us to consume this stuff. Whether it's augmenting what we see today, so that when you walk into a networking meeting and look at someone's face, it does facial recognition and says: oh, you remember last time you saw Chris - this is who he works for, this is what you talked about last time you spoke.

Seth Earley: That is so interesting. We can't even really imagine where the next five years is going to take us.

Chris Featherstone: And digital natives on YouTube have changed a lot of the semantics around this. Now the scope of search results and interesting information can be completely generated by somebody and you don't have to do it through a traditional video medium. The creativity is going to skyrocket. Sky's the limit.

Henrik de Gyor: It's melting those creative barriers we had back in the physical world. Now you can create anything out of anything, in any medium. A lot of that hasn't been commercialized and democratized yet - you don't need to know an infinite amount of code. It's a matter of being able to use those tools very simply, or having services where you pay someone to create that avatar or the thing that would be useful to you or your company. Those services are coming in the very near future.

Our imagination is actually what's limiting us. Just like Web 1.0 - unfortunately, the military and nefarious things came first, and then eventually people saw the possibilities. The most exciting thing we could think of back then was to put the phone book online.

Chris Featherstone: Let's go into the nefarious side of this for a minute, because that's going to push the envelope and help us understand our limitations and where the lines need to be drawn.

Henrik de Gyor: Sure. When people don't understand it, they fear it - especially if they're watching too many movies or reading too many fiction books. And sometimes when you understand it even more, you fear it even more.

To the point around trust - there's actually no way to measure trust, let's be honest. However, the market - specifically the Content Authenticity Initiative and the Coalition for Content Provenance and Authenticity, called C2PA, which you can find at c2pa.org - figured out that you can track the provenance of something. What does that mean? It means the history of how it was created, edited, and displayed to you. From inception, from whatever device created it or whatever software or machine made it, you can figure out how it was edited, how much of it was manipulated, et cetera.

There are betas right now and a white paper that was recently unveiled. It's an evolving model - not a standard yet. That standard will take several years to be implemented across all our technology, not only software but hardware as well. The devices we have now will need to be upgraded. Once you get that newer device, it will have those other factors built in. So eventually, when you see media in the coming years - three to five years emphasis - you'll see a little icon. You click on that icon and you'll be able to see the provenance of that file: who created it, how it was created, where it was created, what happened to it, at different levels of detail. If you're a forensic investigator trying to figure out if this is real or fake, if multiple images were combined, if something was removed or enhanced - you can figure out what was done to whatever degree you care to dig in. That's the only way, because you can't declare intent and actually believe it. People lie.

So since people lie, you can't just say: oh, this is only truthful information. There's no way to declare that in an authoritative way, regardless of what the source is, because sources can be hacked. That's a cybersecurity challenge we already know exists and is still very prevalent. So how do you do that? You figure out the provenance.

For a while - starting this year, literally - you will not see the difference between what's real and what's not. And that's a little bit scary.

Seth Earley: So technology is outpacing our society's ability to really understand whether these things have been tampered with or synthesized. A few years is a really critical time.

Henrik de Gyor: Yes. And the best thing to do right now is to understand it by using it. If you use it - whether you start as a consumer and eventually figure out how it can be helpful to you or entertaining - use the democratized tools, learn it, try it out, hire contractors or employees who do this on a regular basis. Most of the top experts in this field have literally done it for one to two years. That's how new it is. The top experts in the world have done this for one to two years tops. So you're on the ground floor right now.

And the technology has already passed the Turing test - meaning you can't tell whether it was made by a human or a machine. Most of it has passed that test, and it will continue to pass it. The glitches in the video or images will basically vanish. They essentially have already. Same thing with audio.

Chris Featherstone: Let me ask about nefarious applications specifically. If I have ten minutes of Henrik's face and voice, I can create something that might open up his biometric security. And in a court of law, video and audio can no longer be accepted as authentic because I can't tell the authenticity anymore. Where's the line today?

Henrik de Gyor: It's not public domain just because it's on the Internet, let's be honest. You still own the rights if you're the creator of that content, unless you gave them away in some type of contract. It's not a matter of rights per se, but it is a matter of consent. You can do this to anyone technically, however, if you have their consent, they're less likely to pursue you legally - and you can do this from any country in the world, which doesn't mean the law will extend to that country.

Now if they're a politician and you're trying to do satire or parody, that's different - in certain countries that may be protected, in others you might be persecuted. And this goes back to provenance - how it was created, who's in it, how authentic it is.

The nefarious content for the most part is non-consensual intimate imagery. It's very prevalent. Face swapping celebrities onto other bodies doing activities we don't need to go into. Beyond that, the most popular nefarious use is artists creating videos of celebrities for either political reasons or humor. I interviewed the person who creates the deep Tom Cruise account on TikTok - those are well-known videos. People who know Tom Cruise mistake them for him. What they did was scrape all of his movie footage and film footage and images of him at every angle, with his mouth open, closed, doing whatever - so they can make his face look like he's saying whatever they want him to say.

Seth Earley: Where are the commercial applications for Fortune 1000 companies? What should they be thinking about today?

Henrik de Gyor: Right now, literally today, you can use it for e-learning, education, training people within or outside the organization, corporate communications, product demos - whether it's a product or a service. And all those can be done synthetically even before the product exists in physical form.

Realistically speaking, most of the catalog images we see - whether in physical catalogs or virtual catalogs on the web - about 70 percent of them are already virtual. They're not photo shoots anymore. It's easier, faster, and the end results are cleaner and cheaper, honestly, than a photo shoot or video shoot, because you can synthesize anything, whether it exists or not, and assemble it. A room in an apartment, a piece of furniture, any product you want to sell on an e-commerce site - you can have every single angle you want in a 360-degree virtual view of the item, and see it in use.

You can synthesize your executives so they can give talks across the world. You can have them give speeches, have synthesized Q&A already done, not by the executive actually speaking, but by synthesizing their voice - and in a couple hours you can synthesize someone's voice not by making them say a script, but by training on how they say things. That's how you synthesize someone's voice. Then with a little more effort you can synthesize their torso or entire body. If you're a model you can synthesize your body and have it on a virtual runway, wearing any clothing available to your size, completely virtually.

Seth Earley: Wouldn't Fortune 1000 companies be more concerned about the threats to their brand and security? It's a double-edged sword.

Henrik de Gyor: Yes, absolutely - and it goes back to authenticity, consent, and understanding the provenance of files. Once you understand that, which most people don't because they haven't gone down that rabbit hole, you'll start to understand both the opportunity and the threat. Start with a pilot project at the very minimum. Potentially hire contractors or employees to explore it. There are many tools out there, a lot of them from startups getting millions of dollars in backing because they make it really simple - pick an avatar in minutes, type in the script of what you want them to say, they say it perfectly, and it outputs a video. You can pick from stock avatars, and eventually you'll be licensing other personalities and likenesses.

These democratized tools also have content moderation built in. You're not allowed to use bullying language, explicit content, and so on. You have to be an adult to use them because of the potential graphic nature, the potential immersiveness of VR and AR that can be confusing or even traumatic to individuals who aren't ready for it. What doesn't exist yet is something like film and music maturity ratings for synthetic content - that's coming, but we're just getting into the provenance standards right now.

Seth Earley: We have a few minutes left - what about haptics and the future?

Henrik de Gyor: We're going to add more dimensions beyond sight and sound. Haptics is already happening - you can feel an object through haptic gloves when it passes you by in a VR environment. This is popular already in senior centers, where they bring VR headsets to residents so they can go visit Rome and pick the era they want to visit - Caesar's age - and walk around with Romans in togas and see the markets next to the Forum. Or they can go swim with whales, and as the whale passes by, the haptic gloves give you the sensation of it.

But beyond touch, I spoke to several experts who have been working on the other senses for the last 20 years and have actually made - both chemically and electronically - synthetic tastes and synthetic smells. Working examples. One expert was tapped by a Michelin chef who only had so many reservations per year and wanted people to experience his food. They did it with an attachment that plugs into the bottom of your phone, and with an app you could trigger it and have the taste sensation of a specific dish he made. For smell - there's a small probe you put near your olfactory glands that sends electronic signals and actually gives you sensations of citrus, sourness, sweetness, and so on. So you can imagine the mixing of taste and smell, and then granted, eating is both of those along with chewing and sensing texture. You can get at least two of those three in the coming future - it's still under development, but it has been commercialized. It's out there and it will be the future of more dimensions, more realities, that we can mix - even things that don't exist anymore or don't exist yet.

We're thinking of things we could do with the last version of Web 2.0 right now - we're just iterating that in potentially more dimensions. We haven't figured out how to do the Princess Leia projection without a stupid amount of equipment in our room. We haven't figured out how to do 3D anything beyond 2D projection. But that will evolve quite rapidly, and what's going to be more exciting is we're going to add not just AR, MR, and VR, but also touch, taste, and smell. There's a lot to digest and a lot ahead.

Seth Earley: It's amazing to think about where we'll be in five, ten, twenty years. Science fiction is a precursor to science fact.

Henrik de Gyor: Exactly. If it's not physically impossible, it's possible - and what we define as physically possible today may not be the same in the future.

Seth Earley: A lot ahead of us. Thank you so much, Henrik. Please do grab the book - it'll be released on February 2nd, available on Amazon, the link is in the show notes. Check out Henrik's podcast as well. Again, thank you to our sponsors - CMSWire and Simpler Media, the Marketing AI Institute, and Earley Information Science. And thank you, Sharon, for your production behind the scenes as always. This has been a pleasure.

Henrik de Gyor: Thank you for having me.

Chris Featherstone: Henrik, it's a pleasure, my friend. Really, really interesting topic. Thanks for being on.

Seth Earley: We'll definitely get you again in the future for an update on where things are going.

Earley AI Podcast - Episode 6: Synthetic Media with Henrik de Gyor

Deepfakes, Haptics, and the Coming Crisis of Authenticity - What Every Organization Needs to Know About AI-Generated Media

Meet the Author

Let's Connect