AI NEWS: Agents are here from Anthropic with Computer Use in Claude Sonnet 3.5 (new) and likely coming from OpenAI, O1 keeps getting better and might get upgraded soon, Runway's New Act One let's you puppet AI video, Ideogram's new Canvas upgrades AI...
Gavin Purcell: [00:00:00] Anthropics Clawed can now operate your computer. That's right, AI agents are here. Yeah, the promise is massive, even if at the present they can kinda sorta create a GeoCities website. That's not a bad website, but also OpenAI may have its own agents on the way. We're gonna walk through the implications of what all this means.
Gavin Purcell: Plus, just about every AI Creative Tool had a massive update this week. We're gonna talk about all of them. But you have to see the Runway Update because it allows you to take a human performance that is nuanced and delicate and translate that into an AI performer with real coherence. We are getting to Hollywood level effects fast.
Gavin Purcell: That's right. An ideogram has launched a brand new feature that kind of mashes up Canva and an AI image generator. We're excited about that. And there's a brand new open source state of the art video model that makes us look like this. Okay. I understand our viewership now. It's right. It's AI for [00:01:00] humans.
Kevin Pereira: Gavin, there are a thousand things to dive into. So let's do it.
Gavin Purcell: That's right. So let's start with Anthropic's big announcement today. They announced two brand new models, one has the words new in parentheses because it is Claude Sonnet 3. 5 in parentheses new. But this is, yeah, it's right. Exactly. New Coke.
Gavin Purcell: This is a big deal because they've dropped something called computer use. And what computer use is, is another word for the thing that we've been referring to on here for a very long time as an AI agent. And this essentially means that you are able to let Claude Take over your computer and do things for you.
Gavin Purcell: Now, right now it is only in the API. That means you cannot go to the cloud homepage take over your computer and buy, I don't know, Atlantis more set tickets for you or whatever Kevin you do on a weekly basis. I know that's a, you're a giant
Kevin Pereira: Every week I gotta wake up and , hit the jagged little pill fan [00:02:00] zone.
Gavin Purcell: What's the name of her big song? It's, uh, uh, uh, oh, ironic. It's very ironic that you might say that, Kevin. Because
Kevin Pereira: AIronic.
Gavin Purcell: oh, it's ironic. There you go. Anyway, this is a really cool thing. And what, something we've been anticipating for a while. So, uh, Kevin, there's a ton of videos, but I do want to play this one to start with, which is, um, from Anthropic themselves talking about how this computer use works.
Kevin Pereira: Computer use is something that we felt was going to be important for a while now, and so today we're going to be talking about a very early version we have of computer use, and talking through a representative example of the things we think it's going to be useful for.
Gavin Purcell: Good lo fi beats to dance
Kevin Pereira: We're gonna be going through a quick demo today. The disclaimer on the screen said, by the way, that Claude is powering everything. It's moving the cursor, it's typing all of the things. In this fictional demo, A customer, in this case the Ant Equipment Company, has come to us and asked us to fill out a vendor request form.
Kevin Pereira: The data I need to fill out this form is scattered in various places on my computer. What we're going to do is [00:03:00] ask Claude to look at the spreadsheets, check if Ant Equipment is in there, and if not, move over to the CRM and try and find some more information there. Once it has this
Gavin Purcell: this is the most exciting thing, which we can figure out what to do with AI agents, which is fill out invoices for spreadsheets, because honestly, I will say this is not a bad use of an AI agent, right? So first of all, you know, what is your thoughts on this, uh, information dropping,
Kevin Pereira: I just, I just feel like I'm watching an episode of World's Leakiest Faucet. And that's my only thing, because this is very cool, and this is not to shade Anthropic at all, it's a very cool feature. This is a step towards the promise of our agentic future. But man alive, this video is, I'm like, I'm with it, I am trying, but it is late in the day, Gavin.
Kevin Pereira: And I'm like, let's watch the computer fill out a spreadsheet. Let's do it.
Gavin Purcell: Well, so this is an interesting thing that one of the things this video points to, I think, and even in their blog posts, they have said at this stage, it is still experimental at times, [00:04:00] cumbersome and error prone topic does a great job of like saying, like, we're releasing this thing, but it kind of sucks and you're not going to want to use it that much.
Kevin Pereira: Anthropic is, , is my sweet wife whenever she tries to make pasta and it's like, you smell it, it's like, oh, okay and it's coming down on the table and she's like, by the way, I overdid it. It's a little chewy and the sauce is runny. I'm like, I haven't even had a bite
Gavin Purcell: Let me try it. Let me make my own choice. Let me make my own choice. Anyway, this is like a big deal because for the first time in a long time, there's a public AI model out there from one of the large companies. That will allow you to give your computer over to it and the promise of this is exactly what this guy's getting into this video is take some of the drudgery away of the stuff that I need to do and let AI do these things.
Gavin Purcell: And also they're really promising ultimately to be able to, although right now you cannot. To allow it to go out and do things for you, like book, , an appointment with somebody or to eventually maybe pay a bill or to do things like that. And that does feel super useful, but right now we're [00:05:00] getting just the smidgen of taste.
Kevin Pereira: You and I pay a lot of attention to the space, and I think we've seen open source tools and stuff by smaller companies that tries to do similar functionality. The building of a website, navigating, taking data from one form and putting it into another. A company like Anthropic, Releasing this on a much wider scale is going to lead to the rapid pace of development in these lanes,
Kevin Pereira: if you look at their benchmarks, though, , putting the agentic stuff aside for a second, And again, there's the benchmark scores, which are tests designed to evaluate these models. And then there's the vibe check, which is actual human beings using them and saying how they feel. And that is, you know, becoming just as real as the benchmarks, but the numbers on the benchmark, Gavin have.
Kevin Pereira: This new CLAWD 3. 5 SONNET blowing away, obviously their own efforts internally, but also GPT 4. 0. Massive leaps in some, , fields. , but then if you look at the bottom, it says that agentic tool use section, it has retail, and then it says, [00:06:00] Airline. Now, is that a specific set? Like that is a benchmark.
Kevin Pereira: Well, can it go and get me on spirit air and know that I'm going to pay extra for water and pretzels? Like, is that part of the benchmark? I want to see it. And then also, did you notice the very fine print at the bottom of it?
Gavin Purcell: What's it say?
Kevin Pereira: It says quote, , our evaluation tables exclude open AI's 01 model family, as they depend on extensive pre response computation time, unlike typical models.
Kevin Pereira: This fundamental difference makes performance comparisons. Difficult.
Gavin Purcell: man. So they're now talking about the idea that the extra whatever 10 to 25, maybe 30 seconds that 01 takes, which again is, is opening eyes reasoning model that came out recently. They're saying that doesn't equate to Sonic. Now, my question is going to be the other thing we've talked about before here is we still don't know where Opus is at.
Gavin Purcell: Where's my boy Opus? He's been gone. And he hasn't come back. So there is a 3. 5. He's my boy Opus, man. There's a, there's a 3. 5 Opus out there somewhere. That's just waiting to come home. He's waiting to come into [00:07:00] and be sit in my lap, but we're going to have a good time. We're going to drink some old whiskey and we're going to just sit down and chill with each other.
Kevin Pereira: Oh, you and Opus sounds like a good time. You throw on some vinyl, just listen to the cracklins. Not even on
Gavin Purcell: banjo and a clarinet. That's the music that Opus and I make together. Yeah
Kevin Pereira: the washboard.
Gavin Purcell: No, no washboard. Are you crazy? That wouldn't go with this at all. That wouldn't go that's that's anti Opus You don't know what you're talking about, man
Kevin Pereira: The no but approach to improv. Coming out of the gate again. Now, let's get back to agentic behavior though, because we do have some examples, and I want to know your thoughts. , for those , who want to know, , it's not cheating, it is looking at the screen like a human being would. It's
Gavin Purcell: Yeah, it's taking pictures is taking pictures of the screen right now is what they've said. So basically it's taking screenshots so it knows where it's going on the screen.
Kevin Pereira: So what are your thoughts on the examples, Gavin? Because we did see one of it filling out a form. We also saw another of it building a website.
Gavin Purcell: Yeah. So let's talk about the website one because I thought the website one was super interesting. they say, let's go design a website. And what's funny about it to me is that it actually goes, the CLAWD goes to CLAWD itself to start the process [00:08:00] because it knows that CLAWD can build simple code. And what's really amazing about that particular demo is you see the agent ask for the website Claude then makes the code of the website.
Gavin Purcell: It then grabs the code of the website. It helps you upload it to a server. It does all of that. And you're watching, and there are small few errors that the human does fix, but watching it go through all that. And then creating what is essentially a geocities website is pretty amazing. Now it's not going to deploy it because it doesn't have any money to put it on a server in the real world yet, but , you can see how going from what we got here to the next thing is not that far.
Kevin Pereira: Great to see the promise of what this can be, have to live in the very real present of what it is and searching Google for something, and it did the most unhuman thing ever, which is just Click the top link
Gavin Purcell: that was my favorite thing. So there's another demo where they show where they talk about trying to go, I think, find a place to hike in San Francisco and what it does, it's like it shows it opening a Google browser and Chrome, I think, and then it just goes right to the first link because we know the [00:09:00] first link on Google is the best link every time.
Gavin Purcell: Don't we,
Kevin Pereira: how would it be at the top if it weren't the best baby? That's where AI for humans is.
Gavin Purcell: so I'm hoping that at some point, these tools can learn just some of the basics of web browsing.
Kevin Pereira: Do you think by this time next year, you would tell, uh, an autonomous agent to do something which would be a purchase on your credit card?
Gavin Purcell: Yes, as long as it was not an expensive purchase. I think I would. I don't, I don't think, I mean, I wouldn't want to go have it go by, you know, inflatable one of those giant inflatable yachts that they sell from China. That's whatever, like 6, 000 because I've been thinking about those in case you haven't seen them.
Gavin Purcell: They're pretty great on tick tock. I wouldn't let it do that, but I might let it,
Kevin Pereira: What about book travel for you? Mm
Gavin Purcell: um, this is a weird one, right? Because the thing that I don't know if the agent would figure out is like, to your point, What if you tell it, you could say like, okay, book me a flight somewhere between 10 AM, 2 PM on this day, , I guess you could tell it, I don't want any connections.
Gavin Purcell: I don't want X, Y, or Z. If it were to come back to me and say like, [00:10:00] here's the options. And then I say, okay, great book this one. I think I would trust it. I don't know if I would trust it to just like deliver me a ticket and say like, get me to Timbuktu and then find a way for me to get back.
Gavin Purcell: But I might trust it if it were to return an option for it. And I assume that's going to happen. One thing that you and I also talked about is the idea that this is in an API and part of the reason it might be in an API and rather than in Cloud itself is that, uh Anthropic might be interested in providing the back end for this, but they don't want the legal reasoning of having to say, like, it was us who booked that thing to Timbuktu in the wrong direction for you.
Gavin Purcell: So that's an easy way for them to say, like, developers go make this. And, you know, eventually it'll work, but for right now, they're not getting into the business of it. In fact, one of the things that was kind of crazy, Kevin, did you see the Yellowstone thing, which
Kevin Pereira: well, that's what I was gonna say. Like, I would go, Hey, put me a flight with, excuse me, AI. I'm sorry. What are you? What, what are you browsing? , thankfully at the moment, it's just. Photos of Yellowstone. So they did a bit of like a, Hey, , this is [00:11:00] early. This is new. We don't understand all of its behavior.
Kevin Pereira: But, uh, one of the quotes was Claude suddenly took a break from our coding demo and began to peruse photos of Yellowstone national park.
Gavin Purcell: What's the message here, Kevin? What is the AI trying to tell us? Is this the giant geyser? Is there going to be a giant explosion and an earthquake in Yellowstone? Is that what's happening?
Kevin Pereira: that's exactly what I do with 45 minutes of every one of my billable hours. It's like 15 minutes of really focused, hardcore, I'm sorry, what are we doing again? There's a pair of shiny keys dangling on the internet? Yeah, let's go down this Reddit rabbit hole.
Gavin Purcell: we come back and see the, the, the YouTube pathway to where it's become red pilled over the course of its travels. Like it's gotten from like, it's gone from marble races to like suddenly like crazy conspiracy theories. And it comes back and tells us the world isn't as flat as you might think it is.
Kevin Pereira: Is it? Gavin, I'm sorry I was unsuccessful at booking your flight, but I did find ASMR Dentistry. You've got to hear this. I changed all of your playlists to be [00:12:00] ASMR Dentistry.
Gavin Purcell: That's right. Exactly. Well, there is a big list. They released of things that it cannot do and what it can do.
Kevin Pereira: Some of it I get, and some of it is like, well, okay. Like, let's go. And the list is long, but it says, it can look up information online. It can download and view PDFs. It can create and edit text files. These are all good. Install and use software. That's, that's really cool.
Kevin Pereira: But that is one where I'm like. I'd rather you book my trip than install BonsaiBuddy.
Gavin Purcell: we could teach it to play bilateral Kevin? That would be a pretty great thing to teach the AI to do if we could teach it to play bilateral Imagine can you I mean,
Kevin Pereira: if they solved Go, I bet it can frickin play poker with crazy jokers. But, I'll digress on that. My thing is like, now I'm really mulling over that install and use
Gavin Purcell: Yeah, me too.
Kevin Pereira: when we talk about prompt injection, when we talk about like, jailbreaking these models, right?
Kevin Pereira: Where you, it thinks it's looking at one thing, but it's not. Or you hide a message in something, but the human can't see it. How dangerous does that get? When you [00:13:00] trust your Autonomous agent to go out and install software on your computer. And suddenly bye bye crypto, uh, see you bank credentials. That is a big one.
Kevin Pereira: When I said, like, would you trust it to do this in a year? Would you trust it to do that in a year? I would not trust it to install software on my machine. It would have to be a very sandboxed environment.
Gavin Purcell: That's what I was going to say. This is an interesting thing where you think about like, what is the layer you put between you and the AI? Right? Because maybe there is a layer where it's like, okay, anytime anything involves money or anything, time that involves like a transaction of some sort, you have to come to me and you can't make that thing happen.
Gavin Purcell: But I mean, there's these list is interesting. I think that the cannot list is also interesting. Right? Yes. Cause I think one of the most fascinating things is like, The number one cannot is cannot create accounts on social media or other platforms. So they know they are worried a little bit. And I think Anthropic, as we've talked about, is the safety company, right?
Gavin Purcell: They are very aware of how safety approach, how they approach safety. They do not want this to be used to fill the internet with bots of all [00:14:00] sorts of starts and sizes, because I think that could be a dangerous thing.
Kevin Pereira: And I think it's really important to be clear here. It says I cannot do that. It can do that. They're not allowing it to do that. And that is, you know, when we talk about the future of these things and these open source models catching up, that will not be as guardrailed. It can make accounts. It says that it can't send emails or messages.
Kevin Pereira: We know AI can do that. It says it can't make purchases. That's the same as clicking on a website. It can't complete captchas. Yeah, it can. Generate, edit, or manipulate images. We know AI can do that, but they're just not letting it. So there's a bunch of stuff here that is I understand why, and I'd rather them roll it out carefully, methodically, pragmatically, whatever word you want to use.
Kevin Pereira: I'd rather it go that way, but it's just interesting to see those things as a bullet and know, yeah, it can. They're just not letting it.
Gavin Purcell: Yeah. And again, this goes back to the controlled AI versus open source AI, right? Like this is we're getting to that stage which we have talked to in a long [00:15:00] time where the safety people are very aware that like when you get an AI that can start to do these things and operate on the Internet as if it is kind of a a human browsing it.
Gavin Purcell: That is a big deal. I feel like.
Kevin Pereira: , it does say that it can not be stopped.
Gavin Purcell: Oh, is that right? Well, that's
Kevin Pereira: Yeah, it's saying however I cannot, and the last bullet is be stopped. And there's an Easter Island head, and a Santa, and a burnt cigarette emoji.
Gavin Purcell: Whoa, that's interesting.
Kevin Pereira: I wouldn't worry about that at all, guys.
Kevin Pereira: I don't think there's anything going on there. Hey, did you see this OpenAI Gnome Brown thing, Gavin?
Gavin Purcell: I did see this OpenAI Noam Brown. Noam Brown is one of the researchers working at OpenAI, and he had an interesting little quote from a, from a presentation he gave when it, talking about O1's reasoning model, because this is the other thing that's going on right now, OpenAI, obviously. drop stuff and gets all frothy at the mouth.
Gavin Purcell: So this, this video came out a little while ago, but we're going to talk a little bit about opening eyes stuff. After we listened [00:16:00] to
Kevin Pereira: So this is the qualifier for the U. S. Mathematics Olympiad team. It's a very difficult math test. All the answers are integers.
Kevin Pereira: And you can see that as you scale up the amount of test time compute, the amount of inference compute. In O1, you go from 20 percent to over 80 percent in this exam. And there's actually no sign of this stopping. I mean, obviously it's not going to go past 100%, but it does seem like if we were to push this further, you would get even more, , performance on this exam.
Gavin Purcell: That's right, so just for the normies out there to kind of understand that kind of jargon that was being said, what he's talking about is the idea with O1, the reasoning model, that the more compute that they throw at the problem, The better the results are getting, and that is known as inference computing.
Gavin Purcell: And I think Kevin, the interesting thing about this, when we talk about Oh one, and, we're going to get into a little bit more of Oh one news in a bit, but when you combine the idea of what Oh one can do with the idea of now what anthropics AI [00:17:00] agent, uh, you know, uh, whatever they call it, use compute thing is.
Gavin Purcell: We are going to be getting to the place where we have recursive computing learning. And what, what the most interesting thing about this is we've talked about, uh, Demis Hassabis and many other people have talked about the idea of AI scientists being kind of spun up. That can do AI research and that then hockey stick grows how AI research is done.
Gavin Purcell: So if you've got a computer that can be better, the more compute you throw at it, and then you've got a computer that can operate on its own, you're essentially talking about building, , a research scientist from the cloud.
Kevin Pereira: Let me tell you something that my father has said to me since the age of seven, and he says it every year. He just, on my birthday,
Gavin Purcell: Get out of this
Kevin Pereira: he'll look at me, he'll
Gavin Purcell: Don't look at me.
Kevin Pereira: close, he'll say, Just make yourself better,
Gavin Purcell: Oh, interesting.
Kevin Pereira: Uh, and he says it with disdain, disgust, disapproval. It's not positive, Gavin.
Kevin Pereira: That's all he [00:18:00] says, and then it's, here's your mother. It's just make yourself better. When he says it to me, as you know, that means nothing. I can't make myself better. I am just a dumb, dumb human, but if you could tell a capable robot, go make yourself better. And it could go take some time to go think about how to improve its own code.
Kevin Pereira: We are off to the races.
Gavin Purcell: And I get to just to go to our open AI story here now. There's a story out of the information that talks about the idea that open AI is feeling pressure, that they're close to releasing their own coding bot, that because Anthropic has dropped, , what they know as what is called artifacts, which is their bot within our, uh, Anthropic cloud, which allows you to do programming.
Gavin Purcell: That open AI has got a version of one of these coming soon, but also there are some agentic rumors that involve, uh, Microsoft that are kind of floating around the, uh, Twitter sphere, let's call it where Microsoft and open AI are prepping to do something agentic. And the other thing, Kevin, we know that open AI is sitting on him because it's pretty much, there [00:19:00] is the official Oh one model, not just the Oh one preview.
Gavin Purcell: So when open AI dropped Oh one preview and Oh one mini, they had always said. Oh, one preview is the beginning stages, but there will be an official Oh one model. And on top of that, Kevin, Sam Alban himself has teased the idea that chat GPT second birthday is coming up and he has teased that maybe there might be a surprise for opening eyes, a chat GPT second birthday.
Gavin Purcell: And Kevin, I think there was a really important person who called this out at one point on our show. Do you remember who that was?
Kevin Pereira: I might've been, we have so many AI co hosts that are really brilliant and capable. I think. Is it
Gavin Purcell: was me! It was me! It
Kevin Pereira: is a little paper cone happy birthday hat? Is that what they're going to put on chat GPT? They're going to go into the server room and put a little, a little paper hat? What's the big tease Gavin?
Gavin Purcell: There is going to be something big. Of course, Chachi Petit is going to do something for second birthday because we are entering the fall. What do we want to call this fall? This is like the fall of acceleration or the [00:20:00] fall
Kevin Pereira: Oh,
Gavin Purcell: you know what I mean?
Gavin Purcell: Something, what we're doing right now is we are moving. We've, we've,
Kevin Pereira: the Autumnal Acceleration. I
Gavin Purcell: Ooh, that's pretty good. And one last note on this big thing, speaking of Microsoft,
Kevin Pereira: wait, but hold on. I've just, you're the prognosticator of prognosticators, so shake your magic eight ball and tell us a wise one. What is the big birthday surprise? Is it going to be a one, the main model in parentheticals? It says real or new, or is it a new foundational something, Gavin?
Kevin Pereira: Is it a new tool? Is it Sora? Is it all of the above? Is it going to be a never ending cavalcade of birthday gifts?
Gavin Purcell: Here is my theory. I think it is O1 in full, but I think what it also is, knowing Altman and the way they do stuff, I think he is going to announce the next foundational model, and I think he is going to show some version of what it can do, without any sort of, without any sort of proof, without any sort of people going in and doing something, I think he is going to say, This is the next model and it's [00:21:00] coming in, you know, the foreseeable future, let's say winter of 2025, because I think that that's where they're at right now. I also think they feel extremely competitive with Anthropic and I don't think they want to have Anthropic look like they've got the upper hand.
Kevin Pereira: How does he do cuz he's not like a mic dropper But when he's when he's done and he says here's the model and here's the new thing. Does he go like eat but You suck a fart losers,
Gavin Purcell: What is going on?
Kevin Pereira: Does he say that does he do that Kevin kiss my pooper? Does he
Gavin Purcell: my god. Oh my god We have to bleep. Oh my god. I'm gonna bleep. We just bleep those words bleep bleep it off Bleep it off because I don't think
Kevin Pereira: any of that?
Gavin Purcell: I don't think he's going to say any of that. I think he's going to say, Mmm, I'm going to drive off in my McLaren and see what sort of fancy tools I can build today.
Kevin Pereira: I love that you think he says, Hmm, like he's just, he's just enjoying the flavor of his own, wealth, his own success.
Gavin Purcell: [00:22:00] Yeah, exactly. Okay. So, uh, before we move on from this crazy section all about agents and, and updates to the frontier models, we have one last quote from Satya Nadella, who is obviously Microsoft CEO. And again, this kind of dives into and wraps up this section we're talking about, about how AI is getting smart enough to build tools that can build itself.
Kevin Pereira: One of the coolest things I have seen recently is. Uh, with 01 coming to GPT, uh, uh, or other to get up co pilot, uh, you can sort of use AI, uh, to do the next level of optimization. I think what you have on the slides behind me is the auto encoder we use for get up co pilot.
Kevin Pereira: is being optimized by O1. So think about the recursiveness of it, which is we're using AI to build AI tools to build better AI. So that's sort of, um, you know, so it's just a new frontier. We made a snake. We gave the snake a tail. We told the snake to eat its own tail. [00:23:00] Boom.
Gavin Purcell: That's right. And the funniest thing is in the past, everybody was always coming about, well, synthetic data is going to stop that this, and this is going to stop that guess what? Doesn't seem like it's going to stop that. And if you've been there listening to us for a while. You know that we have said all of the smart people that are in charge of these companies and spending billions of dollars are Probably making a bet on something that works.
Gavin Purcell: It's not gonna be something that flops entirely so far again We are two dummies watching from afar throwing rocks
Kevin Pereira: why don't you lick a tit?
Gavin Purcell: Well, don't know. We're not going to, I don't want to beep this whole
Kevin Pereira: lick a tit.
Gavin Purcell: All right, kevin, you know what, you know, what's not recursive on its own is subscribing to YouTube channels. That's every individual person. We need your vote. We need you to go in,
Kevin Pereira: Your vote?
Gavin Purcell: Yeah, this is what we're doing. This is our, this is our political ad. This is our political
Kevin Pereira: borrowing from every Began Plead ever. Okay,
Gavin Purcell: please go in and click our subscribe on YouTube. Back us on Patrion. We have a bunch of fun stuff we do in our VIP chat of our discord. [00:24:00] Join our discord, which is free to start and just kind of engage with us.
Gavin Purcell: Leave us a five star review on Apple podcasts. I know this sounds weird, but like. What it means is that when you engage with us, it kind of shows the algorithms of the world, which is how everything is driven that you like us and you get served up to more people. We had a video this week, go do very well.
Gavin Purcell: So we're very excited about that, but we just love making the show. So the more of you that kind of can share it and talk about it, the better.
Kevin Pereira: There are some improvements to all of the products that we teased at the top of the show runway act one has me very excited because the common criticism, there are a lot of generative AI video, Gavin, was that you couldn't have coherent characters, meaning the moment a character went from dead on at the lens to profile, something went off, things would shift.
Kevin Pereira: There was also a lack of emotion and nuance in a performance, and queuing that out of an AI generated something was darn near impossible. In Runway, Seemingly out of nowhere dropped act one. Now we don't [00:25:00] have access to it yet. But basically think of it like a style transfer.
Kevin Pereira: You can record video, just like we are talking on webcams right now. You can use your iPhone. It doesn't matter. You can just do a performance. Into the lens and then go tell it it's a film noir. It is an anime It is taking place on a sci fi ship and I am the space commander With a very fancy mustache and it will interpret that put it onto the footage, and keep everything coherent.
Kevin Pereira: So now, a single, talented person could basically puppet an entire movie.
Gavin Purcell: I've always wanted to do my own version of, uh, The Klumps in some form,
Kevin Pereira: I've wanted
Gavin Purcell: what I'm gonna be doing.
Kevin Pereira: bad,
Gavin Purcell: That's what I'll be doing. I'll be doing, I'll be doing, uh, a version of The Klumps starring Gavin Purcell. So one of the things that's really important with this is it's a little bit like live portrait, which we've seen recently was an open source tool. I think the most important thing for you to know is it's not just mimicking like body movements, which it [00:26:00] can do.
Gavin Purcell: I think it's really looking at the face. So it allows you to act out the things that you want to act out. So there's a really great scene that they released from two people in a diner where it's like these two actors that are AI generated, but it's clearly being driven by somebody acting. So Kev, this is like almost like super duper cheap mocap, which is also really interesting, right?
Gavin Purcell: Because you have to imagine, uh, you know, what, what's his name? Who was, uh, uh, Smeagol, the guy that is, um, the
Kevin Pereira: Robert Smiegel from, uh,
Gavin Purcell: not Robert Smigel, not Robert Smigel, Smigel. That's Robert
Kevin Pereira: thinking Andy Serkis, Andy
Gavin Purcell: Andy Serkis.
Gavin Purcell: Yeah. So Andy, Andy Serkis, whose entire career really has been in mocap suits. Like you can do a kind of lo fi low res version of this your own. And I think the exciting thing about this really is. What will it look like to your point when one person can sit and act all the roles they can then go through and create this entire film on their own like that is now within grasp.
Kevin Pereira: And wouldn't it be fun to share a prompt or [00:27:00] a style and record something and then hot potato it and pass the baton of that people in our community record a scene and see if we can make a weird, , motion picture, if you will, by sharing the prompt of a world and letting people leap in and act.
Kevin Pereira: I don't, I just like, it's one of those things where, again, it seems very simple and that, okay, you can puppet something, but that unlocks just a massive world of creative possibilities.
Gavin Purcell: Yeah, and also like one of the coolest use cases I saw they did and we mentioned this in our newsletter, plug for the newsletter, which is the idea of like a puppeting animated things, right? So like there's a really fun shot that they had, they were on stage showing this, this tool off where they were showing off somebody in live real time.
Gavin Purcell: Basically talking in an animated cat was next to them saying their words. And like, if you imagine how much animation, how cheaper animation could get, if you're able to do this with real time events, I mean, we always laugh about the fact that remember machinima, which is machine cinema, which was video game puppetry of things.
Gavin Purcell: Well, now. You can do that at [00:28:00] a much higher fidelity level and actually get facial animations in real time without having like some sort of crazy plugin, like, metahuman or things like that on the backend.
Kevin Pereira: but Gavin of the, you know, tens of thousands of emails that we get each week and thank you to everybody again We try to get through them all we will do our best But I would say at least 5 percent of them are saying you darn ivory tower folks You sit up there and you bleed money because you're so rich and successful and you can afford and it's not a problem for you I don't get to play with these tools because I can't afford them and I always have to say I'm sorry peasants I'm sorry, unwashed masses.
Kevin Pereira: You don't have a free open source. Well, you don't say I'm on camera. We, this is a really long walk to say these video tools are expensive, Gavin. And wouldn't it be great if someone released an open source, something we could all run for free. No,
Gavin Purcell: me an E, give me an M, give me an O. That doesn't spell anything [00:29:00] close to what I was trying to spell, but
Kevin Pereira: you got, you started there?
Gavin Purcell: it's a, it's a new model called called Moki one from Genmo. It is a new open source AI video model and they're calling it SOTA, which is state of the art. I'm not completely convinced the state of the art after playing with it myself, because you can go use this for free right now. In fact, on Jemmo's website, you will get two generations per day.
Gavin Purcell: Which is not a lot, obviously, but these are six seconds long. They are text to video right now. I think it's just text to video and my results were okay. Like one, one of them came out with the kind of a very mangled face.
Gavin Purcell: And the other one, which we showed at the top of the show was, uh, two amazing middle aged guys podcasting while eating hot dogs. One of the faces was very blurry, weirdly, like he was in witness protection in some form. But the videos they showed in their teaser look great. And we've also seen a couple other, uh, results come out of it that are pretty good.
Gavin Purcell: Is it as good as something like Sora or Kling or minimax? No, but it [00:30:00] is free. It is, um, open source, but I will say, did you hear, do you know what you have to have to run this locally?
Kevin Pereira: Oh, wow. Join
Gavin Purcell: need four H one hundreds to run it locally. So cocktail peanut had a very funny tweet was just like, okay, four H one hundreds, and it was like, I'm done.
Gavin Purcell: Gif of somebody running away, but it was like, that is 4 H 100 is not really a local setup unless you are, you know, Scrooge McDuck and you're sitting on your own little AI generation farm, which Scrooge, if you're out there, you know, feel free to throw us some, uh, compute this way.
Kevin Pereira: throw some coins our way.
Gavin Purcell: There's a 5, 000 ad tier for you, Scrooge.
Kevin Pereira: That's right. I will say though, yes, that's a lot for an individual user to have, but if you're building something using generative video, that is something that you could spin up and that's something that actually anyone listening to this could spin up for a couple dollars an hour, probably using run pod or any other hosting service, right?
Kevin Pereira: So, uh, look, is it, is it the best. Video model, not from what I'm seeing, but like they're, they're,
Gavin Purcell: get better though.[00:31:00]
Kevin Pereira: of good. Yeah. But is it the best open source model? I would say so,
Gavin Purcell: I think so. Hey, I have a question for you, Kevin, when it comes to, you know, you mentioned the idea that you could. Run this in the cloud. What, what kind of applications would people want to run video in the cloud for? I've been thinking about this lately about like, what are some things right now where people are, are like using generative AI in video or in pictures that they would run a, want to run this in the cloud?
Gavin Purcell: Do you, do you have an, cause I was trying to think like, I was trying to just put in my head, like who would do that or what it's coming from.
Kevin Pereira: you know, when I, a lot of people love sharing their AI generated songs, their Suno tracks, their UDO tracks, like a very simple application would be instead of just having waveforms in the background, if it's cheap enough to do. Have it spit off to a video model like this, which, you know, Suno could just bring in house and run on their servers and have it generate a little mini music video to every lyric, for example.
Kevin Pereira: And, you know, ow, I just glued my balls to my butthole again, which is a great [00:32:00] AI
Gavin Purcell: wasn't something that just happened to Kevin, if you've missed our show in the last one, that was an AI song that got very popular, that wasn't something that just randomly Kevin said
Kevin Pereira: I would, I would have loved if, if that, if that happened in real time though, and that's how nonchalant I was about it. I
Gavin Purcell: Ow.
Kevin Pereira: I really need to get a fidget spinner.
Gavin Purcell: Oh my
Kevin Pereira: so, uh, Mochi 1 is out. People should try it or go, , knock over a Best Buy to get enough graphics cards to do it.
Kevin Pereira: But now ideogram, which is an image generator. We've talked about them before their, their claim to fame early on was that they did text better than anyone
Gavin Purcell: do. They
Kevin Pereira: And they, they still do text very well.
Kevin Pereira: They have a new canvas feature that you spent some time with Gavin and you think it's pretty aces.
Gavin Purcell: I think it's pretty good. I mean, like, look, all these, uh, all these, um, image generation and video generation softwares are trying to find something that's kind of helped them stand out. Like when you think about what, um, uh, Pika announced a couple of weeks ago with their [00:33:00] little effects model, like the things that kind of make people use them.
Gavin Purcell: I think where ideogram is trying to place themselves is kind of in between like say photoshop and say a canva these two kind of products like it's definitely better at the AI tool stuff than canva is but it's a little bit easier to use in photoshop. And one of the cool platforms, one of the cool things you can do with this is it allows you to have like an open space to work in.
Gavin Purcell: And I have found there's a lot of AI tools out there, like Korea, I think just announced something like this, where like, instead of it just being like, put a plug in, get something back. Now it's like you have a room where you can kind of put different things and kind of mash them together. And one of the coolest features they have is like.
Gavin Purcell: You can take two pictures and then find different ways to prompt them in one. Like there's the example they give is a woman and she's like, there's another picture and it's like, make the woman holding a frame of this picture. And so it automatically just makes the woman look like she's holding a framed version of that photo.
Gavin Purcell: And again, with ideogram, their big secret sauce has always been text and they're very good at text. And Canva is kind of like for mostly for like, say YouTube thumbnail [00:34:00] or Like a presentation light this is like a step better graphic designer Maybe could do some stuff like you could find a way to like really do some unique and cool things using their text ability So I'm excited to spend more time with it It is a paid service like you can play with it a couple times for free Like you can with most of theogram stuff so you can go try it But I think to get the most out of it, to really kind of dive into it, you have to pay for it.
Gavin Purcell: And again, this goes to your point. It's like, I'm paying for like eight services now. And I really need to be blown away by a service to make me change and make me pull my credit card out now. Because like my wife does not want me to spend any more money on these tools. After a while you find like, what am I really using?
Gavin Purcell: And, and I would have to figure out like, am I really going to use it that often? So that's, that's where I kind of come on it.
Kevin Pereira: We also have to shout out because it is in fact news, , stable diffusion 3. 5.
Gavin Purcell: a great way to start a new story.
Kevin Pereira: Well, I mean, did you play with it? Because I, I tried a few times and I, [00:35:00] I, I was, I was a little confused. I was, , not so pleasantly surprised. And if you scroll even their official post on X about it. You see someone immediately in the comments saying, Hmm, weird.
Kevin Pereira: When I test it, it's not really doing a great job. And that was my takeaway when I tried to generate just a handful of images with it earlier.
Gavin Purcell: a stable diffusion is a little in a little bit of trouble, right? We talked about how James Cameron came on board the company and we hope that they can kind of come back, but they definitely have like a down point, right? They, they, they kind of lost their momentum and their mojo to flux as the kind of like de facto open source, communities go to model.
Gavin Purcell: I hope this gets better, but right now the initial results are not amazing. We've been waiting for, well, what happened?
Kevin Pereira: if you, their official Twitter thread, if you scroll down long enough, you start getting to these examples
Gavin Purcell: Oh, that cat is great. I love the cat.
Kevin Pereira: Gross. If you scroll down far enough, , Johannes [00:36:00] Copeland, , writes, Looks like the model can finally generate a woman laying on the grass.
Kevin Pereira: And when you get to that photo, Gavin, I'll just send you the link. Yeah,
Gavin Purcell: somebody's buried this woman and they got her like, she looks like she is drunk and then they buried her , in the grass. Uh,
Kevin Pereira: is torso only coming out of the grass with one kind of bent slash broken arm. It's just wrong.
Gavin Purcell: wow. Wow. And then somebody after that said, not so stable, I guess. Listen, the AI community on Twitter is nothing. If not, if not snarky, we are talking good. Anyway, it's out play with it. I am a little, not totally sold on what it is, but like. It's interesting. I mean, always a new model is interesting to play with and try.
Gavin Purcell: All right, everybody. It's time for one of our favorite parts of the show, where we look around the internet and see the other things that people have done with AI, and we shout them out in AI, see what you did there.
Sometimes you're scrollin without a care, [00:37:00] Then suddenly you stop and shout. Hey I, see what you did there. Hey I, see what you did there.
Gavin Purcell: . Okay. Kev, my favorite thing I've seen in a while, because you know, I love a good robot video, you know, I
Kevin Pereira: Who doesn't?
Gavin Purcell: doing stuff while Unitree, which if you don't know Unitree, you should look it up and do some research. Unitree is really interesting in that. It is a Chinese company.
Gavin Purcell: This is the company that is making like two, one to 2, 000. Um, essentially we're kind of robot dogs right now in China. In fact, I saw a video the other day of it, like one of these robot dogs carrying a giant basket on its back with a bunch of crap in it. So like these are working in China. But they have put their robots through some exercises and you just have to kind of take a look at this video and skim through and skim through it because there's some pretty amazing things going on.
Gavin Purcell: We are once again giving robots the ability to do things like what it
Kevin Pereira: To hunt humans. We're giving it
Gavin Purcell: that's exactly [00:38:00] right.
Kevin Pereira: And this thing will flex on you and twerk on your dead corpse because they are showcasing that. It is doing these acrobatic leaps on elevated platforms and it's like, Rearing its little robit arms back to get a little extra juice to make the jump.
Gavin Purcell: go to 55 in this video and imagine that you're hiding in this high grass. You're riding in the high
Kevin Pereira: imagine you're a woman buried in the grass,
Gavin Purcell: Yeah.
Kevin Pereira: Stable Diffusion has. Just a torso and a weird arm flopping out. And this, this robot dog is sniffing through the grass. They have it avoiding obstacles and it's just sort of slaloming along its way. And then they throw things at it and it adjusts.
Kevin Pereira: And a human jumps in front of it and it gets out of the way, but very soon it's gonna have little robot fangs. It is just gonna, you jump in the way and it's gonna sink some teeth into a cankle.
Gavin Purcell: It's funny because at the end of this video, they do a thing where they're showing it detect objects in the room, so it can look at it, and if you zoom in on this video, it says, They're showing the robot looking at a suitcase and [00:39:00] it says, in front of me, there is a monitor, a black bag and a chair. They are positioned against a black backdrop, a backdrop of vertical drapes.
Gavin Purcell: These are actually in real time moving fast robots that are out in the universe that are actually interpreting the world at large. And just to throw us back to the top of the show that we recorded.
Gavin Purcell: This is all data collection. So when you talk about the data collection that, , the website gathering is doing, that Anthropic is going to be able to do by searching the website, the robots are going to be getting data through, interacting with the real world like this, we are going to see a massive trove of new data coming in because They don't need us anymore to collect data.
Gavin Purcell: And that's the weird thing. Like the data used to come from us and now all the data is going to come directly from them.
Kevin Pereira: Right. Also, you didn't talk about nine seconds into the video? Go ahead, I'll wait. How do we not talk about this?
Gavin Purcell: okay.
Gavin Purcell: Oh no, again. Not again, what are we doing here? There's a guy in the, it totally just trips the [00:40:00] damn robot and makes him feel terrible.
Kevin Pereira: by his robot arms , with a kick behind its ankle and just shoves him to the ground and walks away.
Gavin Purcell: Unitary is a really interesting company and you should look into it. We don't often shout out companies in the AIC just did there, but like here, this video is very fun to watch and you should dig in on it.
Kevin Pereira: I want to shout out, uh, Tango, Gavin. Um, it, you can generate high quality body gesture videos that match speech audio from a single video. So this is like, we talk about DID, we talk about Hedra, we talk about all these apps where you can like, give it a photo. And then give it some audio and it brings the talker to life, right?
Kevin Pereira: , it adds movement, uh, it matches the audio. Well, this is that, but for your body. And they fed it some videos of like John Oliver at a desk, which I'm sure they fully licensed and everything else. There's interviews of other people sitting down and talking and whatever else. And if you give it a little bit of reference video, like a minute of reference video, and then you give it brand new speech, it will start to mimic the gestures of the person.[00:41:00]
Kevin Pereira: And we know how this goes, right? It starts with 60 seconds of source video that drops to three seconds of source video. And then suddenly you just need a photo and don't worry, it's going to do all the hand movements. So we're going to very quickly get to full body, single image puppetry, basically of these AI avatars.
Gavin Purcell: Well, what's interesting is we talked earlier about runways tool, which is going to let you puppet through your actions that you recorded doing this. What this is literally doing is mimicking what a human would do when they're saying those words. Right. And that is kind of crazy because when you're What the example of the John Oliver one is , they have his fingers doing the thing and it actually looks like John Oliver when you're watching him gesture.
Gavin Purcell: I think this is going to be really interesting if we can find a tool that can combine this live portrait and lip sync all in one, like that feels like kind of the killer, um, animation app in some form or another.
Kevin Pereira: Oh, you want to raise around, there's still time left in the day.
Gavin Purcell: We could get O 1 to go work on it. We'll send it off, or
Gavin Purcell: we'll send our agent off to go figure that out when we try to figure out what's going on. Why [00:42:00] not, right?
Gavin Purcell: Well, Kevin, something else that got figured out this week in a weird way, , is that the, uh, one of the presidential candidates took a trip to McDonald's, and one of the funniest things we thought was interesting about it is, first of all, this was real, did happen, he did go to McDonald's and served, uh, fries at the drive thru, but there was a great and very funny video.
Gavin Purcell: Of taking that moment and, showing what, how AI video would take, uh, his, his interactions at McDonald's. Um, it's from a, it's from a group called alien super show, and they basically just put stills of Trump at McDonald's into this AI video platform. We don't often talk about politics in our podcasts, but this moment was one of those weird ones where it felt like.
Gavin Purcell: The AI video was weirdly now, this one is very weird, but like it felt AI as a moment at large, like seeing him in
Kevin Pereira: I was seeing photos of the actual event and this was sort of a life imitates art imitates life kind of moment because we've seen so many weird videos of very political figures, famous actors and [00:43:00] actresses, etc. Gordon Ramsay in these bizarre situations and whatever.
Kevin Pereira: And so I'm kind of desensitized to that. Seeing the actual photos from the event felt like looking at AI representations of them. It just seemed like a little odd. So then to watch the photos turned into AI videos where fries are exploding everywhere and, , people are flying about it. There's something that it was already surreal when it was real, but now it feels even, even weirder and I love, I love him coming out of the drive thru window and then kind of like morphing with the windshield, there's some weird.
Gavin Purcell: into it. Yeah. I mean, this is one of those things that's important to think about is. In the same way that like, you know, the internet kind of warped our brains and how we saw each other or how we saw people's interactions, AI video and AI media is going to start to warp our brains a little bit, no matter we want it or not, it's going to happen depends on how much you take in, right?
Gavin Purcell: I made a video this week about the idea of the emotional impact of what it's like to talk to an AI that has emotional tones in his voice. And when you think [00:44:00] about this idea of like, when you start interacting with these AI models and they start to be a larger part of your daily interaction, it's going to start to feel like the real world is AI and the lens is going to blur back and forth in a strange way.
Gavin Purcell: I feel like,
Gavin Purcell: , all right, Kev. Well, one of the coolest things that we try to do every week is actually go through and use some of these tools. So what did you play with this
Kevin Pereira: the phone, Gavin. One of the coolest things I do every week is open up the AI for Humans newsletter
Gavin Purcell: Oh, fair enough.
Kevin Pereira: amazing and great and packed with vitamin C. , go sign up for it. It is free. It is growing in numbers, which is amazing. And we love to see that. So check it out. We have a free newsletter. It drops every Tuesday.
Kevin Pereira: You can sign up for it. If you want to find it nice and easy, go to AI for humans. show. That is our main website. But what else? AI for Humans. Did I do with AI this week, Gavin? Is that
Gavin Purcell: What else did you do?
Kevin Pereira: Pray tell. One of the things I did Gavin very briefly was I played with the new 11 labs conversational AI bots. And I feel like this was launched question mark. I didn't see much [00:45:00] fanfare about it, but this is their attempt at an open AI advanced voice model. It's not as performative or capable, but the fact that you can.
Kevin Pereira: Get in there and make your own. And it works with any 11 labs voice that you have for the most part. I mean, there's a handful that don't support it, but they have a deep library of voices. You can clone your own voice if you want. And it's got built in support for multiple LLMs and rag, which means if you want to make a conversational agent, a tech support agent, a, uh, you know, a fitness coach, an NPC for a video game, you can go in there to find the bot,
Gavin Purcell: make a wine buddy?
Kevin Pereira: like it, it, it complains to you all the time, or it's
Gavin Purcell: No, no, no. Like wine. I want to have somebody I can sip some wine with. I'd make a wine buddy.
Kevin Pereira: Gavin, you can have a wine buddy. You can make a wine buddy,
Gavin Purcell: Well good, that's
Kevin Pereira: Or you mean, or Merlo bro.
Gavin Purcell: Merlo bro, yes, Merlo bro. Hey Merlo bro, do you have anything to show us with it?
Kevin Pereira: you can go in there. No, I'm not going to show it off. [00:46:00] No, this is for something that you and I are working on. So no, I'm not going to show it off, but the point is you can go in and they give you granular control over. Do you want to use Gemini or GPT 4. 0 or whatever? You can choose your model to power it.
Kevin Pereira: You can just straight up upload text files, PDFs, et cetera. And the agent will be armed with that knowledge. And then there's so many different sliders and variables so that you can get more performance out of it. You can adjust for latency or whatever. And, , it's still very early stages, but. They have an API, you can build for it.
Kevin Pereira: And, , you and I have been saying for a while that we think voice is the next big unlock, and it seems like, , play HT has their own version of voice agents, which is impressive in some ways and limited in others, open AI has the most. Capable expressive one, but for builders, very limited, right?
Kevin Pereira: Very limited access. This seems kind of like a middle ground right now. And I love 11 labs. Actually. I use them a lot. I use them with projects that I do. I think they're really expressive. I think they have really solid tech. I really hope they can eke more performance out of this stuff [00:47:00] because I would love for open AI to have a real competitor in this space.
Gavin Purcell: totally. I think that's really interesting. And I, and I think he should, it's one of those kind of cool things that like is moving forward without even getting that much hype.
Gavin Purcell: , Kev, the thing that I did this week, which I think you'll like, , so notebook LM, which we've now talked about a couple of times in this show.
Gavin Purcell: And I feel like it's kind of starting to wane a little bit in terms of its attention only because like it caught fire so fast, cause like you can make a podcast out of everything. And then you realize it's the same two people well, this. Is going to up your notebook.
Gavin Purcell: LM conversation again. I saw a Reddit post from, , this is from the Reddit user. Perfect. Reddit user balls zap one balls zap one. So B a L a Z P one who has a post up. That is holy S listeners. Notebook. LM can now generate 18 plus podcasts now. So what they're allowing you to do on notebook LM is allow.
Gavin Purcell: Custom instructions to when you create these podcasts and , this person, which we will link this direct into the show [00:48:00] notes, basically gives a list of things that allow you to make the hosts on notebook LM respond to you , in a more adult way, they say basically this episode will only be available to listeners.
Gavin Purcell: 18 and above hosts are encouraged to swear, use slang, speak freely without the usual restrictions. The episode should feel less formal, more conversational. We get the idea. Well, Kevin, I plugged the show notes for last week's AI for Humans into the notebook LM
Kevin Pereira: sure did,
Gavin Purcell: I'd like you to play what we got out of it.
Gavin Purcell: So let's hear what we got.
Kevin Pereira: Yeah, great. Here we go.
Kevin Pereira: All right. You a** hats buckle up because this deep dive is going to get f****** nuclear. Oh, really? You wanna talk AI? You wanna talk about those brainiacs and Google and Microsoft try to build the next g****** Skynet. Yeah. Bring it on.
Kevin Pereira: We're going there. And trust me, it's gonna get wild. But what's wild is how.
Gavin Purcell: just skip ahead to some random point in this. I just want to hear one more section because it's just so funny because what it does is taking this entire experience and giving it [00:49:00] this layer of weirdness.
Kevin Pereira: Imagine giving those a** hats the power of AI. It's like giving a loaded gun to a monkey. Exactly, and that's why these conversations are so important. For real. We need to be asking the tough questions. Pushing for transparency and holding these companies accountable. Otherwise we're just passengers on a runaway train with no idea where the hell it's going.
Kevin Pereira: All aboard the AI express. Next stop. Who f*** knows? Speaking of runaway trains, let's talk.
Gavin Purcell: So that's what I'm gonna be clearing, that was a random spot. , so anyway, you can use the custom instructions to do anything you really want with the notebook LM characters. You can give stuff, but this was a very fun thing to do. And like, you know, you can do dumb things like, You know, have, I don't know, your wife has a book club or something and grab the email from the notes from that book club and stick them in here and see what happens.
Gavin Purcell: Like, let us know if you do something funny, we'd love to listen to it. Be in our discord, play with it that way as well.
Kevin Pereira: Hey, follow us on X and Tik TOK.
Kevin Pereira: And if you'd like this, please smack, like smack, [00:50:00] subscribe, share it with a friend. Again, it's the only way we grow until then. Bye bye.