You can also listen and subscribe to Curiosity on YouTube, Spotify, and Apple Podcasts.
Transcription of our conversation with Cristóbal Valenzuela, CEO and Co-Founder at Runway.
Rajat Suri (01:06):
Today we're talking to Cris Valenzuela, who is the co-founder and CEO of Runway ml. This company is doing some really cool things on the frontiers of AI where they're translating ideas and texts to moving pictures and videos. They're really on the forefront. They're the leader in the space in this area. They're already being used to make movies today, and this has been a really fascinating conversation with Cris. I think he is such a visionary and really seems tailor-made for this problem. What do you think about this conversation?
Immad Akhund (01:46):
You kind of touched on this, but Runway has raised more than 150 million, but Cris really is looking at this from a life mission perspective. He wants to make it so creatives can be freed to express themselves using ai and it's such an interesting problem and most other AI is a lot more kind of practical. It's like let's go answer these questions or do search and things like that. Whereas Runway truly is exposing AI to creativity and that meld is super interesting and if anyone has a chance they should check out some of the videos that people are creating with Runway. It's really amazing what you can do with AI and generation of images and video.
Rajat Suri (02:27):
Yeah, it seems like the application, it feels like it should be possible with the AI technology, but it's unclear when it's going to happen and so excited to have this conversation because it's really going to be interesting to learn the details from Cris. So with that, let's welcome Cris.
Immad Akhund (02:43):
Hi Cris, welcome to the Curiosity podcast. Extremely excited to chat with you. I wanted to jump straight in. So Runway is super interesting. It's combining cutting-edge AI with this creative industry film. How did you I guess, get experienced enough and understand both of those things well enough to fan this company?
Cristóbal Valenzuela (03:05):
First of all, thank you for inviting me. Happy to be here chatting. I mean Runway is now turning five years and I think I've been working on the idea of Runway for almost a decade now. Kind of like coming from both a research engineering but also an arts background. It's been a passion of mine and the thing that really drives me and my co-founders and the company to find the spaces in between science and art, that's what we do at Runway. We merge fundamental research and foundational models with great crafting and great products that can be used by artists and filmmakers and creatives of all kinds of people to make great stuff. And so it's a combination of an interest I would say that the team has and the founders have in both art and science and how to merge those.
Immad Akhund (03:49):
So when you started 10 years ago before for five years it wasn't a company, what was that path and did you think this was always going to be a company or you were just exploring?
Cristóbal Valenzuela (03:58):
Yeah, just to be clear, the company's five years, but I've been thinking about the idea and researching it for some time eight years ago. I come from a interesting weird background. I say I was doing econ and programming and freelancing on the side, but was also doing art back in Chile where I'm from. I actually was exhibited a museum and a few museums in Chile and exhibitions. And one of the exhibitions I did was a typewriter connected to a software that I wrote that would allow you to create and generate videos while you type. And so imagine a stand where an old Olivetti machine was standing and you can go in physically typing words and then videos were projected on the screen. And so I was trying to experiment or ideate or come up with ways of transferring language attacks into video. And this is before any of the kind biggest models that we've seen so far, even before, of course, like gen one and gen two, this is like eight, nine years ago.
And so I think I went into a rabbit hole of what happens when you start building perhaps research tools or systems that can blend between the way we think about the world. And so how do you filmmakers think about it perhaps via scripts or via your own thoughts? What happens when you have technology that can transfer thoughts into moving images? How do we get there? And that kind of drove me into moving to New York to study at NYU where I met my co-founders and we just started going into our deeper rabbit hole of exactly that tools for augmenting human imagination by building these very complex models, but most importantly focusing on the outputs of how people are going to be using those. I would say a lot of experimentation, research going into unknown places, but a lot of fun. This is what I care a lot about.
Immad Akhund (05:44):
What gave you confidence that five years ago that this was ready to be a company versus a research project?
Cristóbal Valenzuela (05:51):
I never thought of starting a company, to be honest. I think that the company founded us, this is what we always wanted to do. We've been obsessed with creative tools and building these journey models for video, audio, image, text, and we realized that the best way to really devote ourselves to accomplishing this mission was to build a company around it. The don't think I've ever considered myself a founder or I thought of myself as an entrepreneur. I never thought of raising money, I was just trying to do exactly what I care and love about. And then you quickly realize if you want to do that at scale and really have an impact, the best path forward is working and starting a company. I come from a different perspective of approaching and building a company rather than thinking that I'll do it from day one.
Rajat Suri (06:36):
I love this description of translating thoughts into moving images is kind of what I think you said. It was really interesting. What I think about when you say that is like look, moving images have so much more data in it. They're so much more data rich than our thoughts in some ways when we do think of things, we do think of images sometimes, but we also translate those images into words then you're trying to translate those words back into moving pictures. How do you deal with that problem of trying to convert words essentially words into moving pictures or do you think of it more as translating images and words or other content into these moving pictures?
Cristóbal Valenzuela (07:15):
That's an interesting question because it goes back to the root as to why do you want to do it in the first place? Why is it that it's worth finding these systems or multi-model systems that can transfer or translate words into images or moving images and videos? I think the goal for us is rooted in the idea that it should be an expressive tool and an expressive tool can actually have different forms of exploration. If you speak with a filmmaker or an artist or musician or any creative of any kind, inspiration and ideas come from anywhere, from anything. And so being able to translate not just your thoughts, but sometimes like a beat, a music rhythm or a me image into another form of art is at the root of any creative process. Inspiration comes from sometimes looking at, I dunno, a color or an art piece or sometimes the street or people.
And so if you think about it from that regards, more than just text to video or words to video, it should be a process where you can do everything to anything. So audio to video or video to images and images to video. And that's what really happens in the minds of any filmmaker, artist and creative. You're thinking in all sorts of different senses. And if you think even when you watch a film, when you go into the theater, it's a multi-sensorial experience. There's audio, there's video, there's texts, it's an experience. And so I think creative tools and I think tools in general should work in the same way that we think and that we work. And it wasn't possible before because the tools were very much unidirectional. They were reacting to us, we doing something and they were just following those instructions. But the new level of models that we're heading into and the new level of capacities of these AI systems would allow us to have a much more feedback process system where you create something in some way, you can just say an idea or something and then the system might react to you in a way that you might do it with a human as well.
And so for us it's more about that how do we go from anything to everything rather than just text to video. And then of course there's challenges and problems and things that need to be built to make that happen, but I always come back to it's not just about text to video, it's about what people want to do and how do you build really meaningful products based on that.
Rajat Suri (09:36):
Yeah, that's super cool. That's a really exciting vision. Anything to anything, how do you deal with the degrees of freedom issue when you're converting anything to anything, you basically almost have infinite possibilities, right? Do you think it's actually meant to inform a creative person's idea or inspire them or is it meant to actually, what if the creative person wants something different from what is presented and wants a different output? How do you close that loop? For example, I want an image of a horse galloping in a meadow or something. There's a million different ways to present that. So do you think it's more of inspiring the creative or here's some possibilities of a horse scalping in a meadow, or is it trying to actually capture and ask the creative, is it a summer, is it winter? Is it a gray horse? Is it a white horse? And trying to actually get the image out of the creative's brain into that thing, how do you think about those two different kind of paradigms? I think
Cristóbal Valenzuela (10:36):
I always thought about the creative process really as an exploration process. And so it's really hard to optimize for that because sometimes and most of the times you don't know exactly what you're looking for and what you're trying to do to just start doing something and then you find an avenue and you find a stepping stone that leads you to something else. And I think the core, I would say approach and philosophy is of how I would like to think about the best products or the best experiences that will help you explore that are the ones that don't have a very determined direction that you can take. So if there's only one single adoption, this comes a lot I would say from research these days where a lot of the research in both image and video and models in journal comes from the fact that you want to find a benchmark or a number to improve within a specific field or metric, right?
And so you're thinking about it in a very linear way, how do I solve for creating an image? How do I solve for creating a video? But creativity can be solved. It's not an option that you're going to do X and you're going to get y creativity's a process. And so these tools shouldn't just allow you to create the perfect image. This should allow you to have control over how you create multiple images, how you create thousands of videos like the best filmmakers and not the ones that just a camera and shoot in the street one video. They're making videos every single day for 10, 20, 30 years. And that goes by to any creative discipline. I mean the best writers are not the ones that are just typing one sentence and getting it right. They're the ones that are grinding every single data, they computer typing a lot to find the right story and iterate on that story and then find interesting to tell. I like to think it from that perspective of it's hard to think about it as a problem to be solved and more as a continuum that always need to be explored.
Rajat Suri (12:25):
So it's like letting a thousand flowers bloom and help them with their exploration and get them closer to the emotional truth that they're looking to find.
Immad Akhund (12:35):
We’ve alked about multimodal stuff like audio, image, video. I mentally put Runway in the box of this Is ai, augmented video slash filmmaking, is that an incorrect mental box? Is Runway for all these kind of different model or is it mostly about video?
Cristóbal Valenzuela (12:55):
It's mostly about imagination and multimodal systems. I think again, we're building tools for human creativity and human creativity. It's a state of mind that can take different shapes and forms. I think we've always had this idea and even now where we're talking a lot about AI art, where people think about art as pixels, as beautiful images, as things you can generate. I think art can be anything. Art can be a performance, can be a poem, can be an image, can be a video, can be a sound, can be an experience, can be a dance. It's anything. And the tools that we're really building at Runway are tools to help drive that creative process forward. I would say analogy or comparison I've come to always referred to when thinking about both where things are heading on the AI landscape, but also our mission at Runway is to think of what we're building as a new kind of camera.
The camera was perhaps one of the most important technological breakthroughs in the art in the last 150 years before the invention of the camera, which led to the invention of moving pictures and film and cinema, which is now considered the SA art. The only way to visually tell something was painting and the camera represented a breakthrough for not only artists but society at large entire cities and industries and countries were built on the idea of capturing light, of having a device that be a photo of sensible materials, was able to paint a picture not by painting but by technology, but the device is smart enough to do it. And that enabled itself a bunch of new different creative expressions that were in even imaginable before the invention of the camera. You think about visual effects and people who spent time doing CGI or editors for example, those are professions and creative domains that just didn't exist 150 years ago. And so for Runway, sometimes we think about it very similarly. This is a moment in time where we're seeing one of the largest transformations in the arts that can only perhaps resemble the one we saw 150 years ago with the invention of the camera, the jobs, the professions, the type of artistic explorations and jobs that will emerge from it is still to be seen because we're still in the very early stages of this newcomer,
Immad Akhund (15:11):
Practically speaking, someone who's editing photographs normally uses a Adobe Photoshop or something and someone that's editing audio might use GarageBand or something like that. So I guess in terms of the software that you actually make, do you actually deal with all of these different things like you can drop in a picture and edit edited or video and everything is natively handled or do you just focus on one of these mediums?
Cristóbal Valenzuela (15:33):
I think it depends on the scope of focus you want to have from a product side of things and focusing on short-term or long-term visions for short-term, I think there's a lot of AI enabled or enable power futures that will make the life of editors for both image, video, audio, whatever model of expression you're using to be faster and quicker. I think ultimately if you think about this as a new entirely different form of expression, then you won't have Photoshops, you won't have video editing. And the way we know about it, and again if you go back to the camera, the tools that people had at the time were brushes and paint tubes and oils and canvases, and those were the things people were using to tell stories with images the first time the camera was around. What people tend to do was to use the camera as a paintbrush and so do portraits and do the same things that you were doing before, but with this new medium now, when things really started to change was the moment that people realized that the camera was a completely different medium, it enabled something completely set of different grammars and a completely different set of creative possibilities that weren't necessarily related to what painting used to be.
It was just different. It was completely different. And so for Runway, and I think for us, we always thought about it from that perspective. There's a chance and there's an opportunity to build a better version of the creative tools that we have today, but the much more interesting long-term version for us is it won't look like any of those things. We're using paint R today, we're painting with oil and this thing that we are about to experience won't look like anything we're used to in the last 10, 20 years. And that's really exciting because it's still to be defined, it's still to be determined, the form and the shape of that creative form, and it's a matter of getting more people to experiment with it and play with it and kind of figure out together what that looks like.
Immad Akhund (17:29):
So when you're writing launching Gen two Runway, I guess you figured out some of that, right? Maybe you can just describe it. Is it a plugin into existing software? Is it completely standalone software? How does it work and how do people interact with it?
Cristóbal Valenzuela (17:44):
It's software of its own. If you think about digital video and the processes pre and post-production are based on assumptions that were built 30, 40 years ago. You have content that you need to edit and compose in a non-linear form and you put tracks and objects and you have brushes and scissors that are actually were cramming from the analog world metaphor. You render video, you click file export and you render a video. And so the story is baked in a very specific way. Now the moment we're going to start switching from render pixels to generate pixels, you might never have to export anything again because you're just generating them hopefully at some point in real time. And so the narrative possibilities there radically change the form of idea of creating content linearly and having that being distributed in the same way to everyone might change.
And you might watch a video or a film or a short film from one point of view, and I might watch it from a different point of view, or you might be a character in a movie and I'll be the same character in the same movie because those videos are being, and those stories are being generated as you're watching them, which again changes the nature of how you create them in the first place. I think a lot of will be, language will have a huge impact. A lot of the content will be created or started, but just natural language, the way we're doing it right now, you can just prompt and get videos out. You need to see a bit more control over how you manipulate the content that you're created. A great mental model that we build in Runway is to try to think about those multi narrative angles instead of very linear stories that you can make.
Immad Akhund (19:21):
I have a rough idea of how large language models work, and I have a rough idea of how diffusion models work around image generation. Are you putting these kind of technologies together to go edit a video or is this fundamentally a different technology that's going to enabling Runway
Cristóbal Valenzuela (19:41):
From language models?
Immad Akhund (19:42):
I mean, if someone types in, show me a whole scalloping, are you taking those language models, applying diffusion and generating images that you stick together or is it a fundamentally different approach?
Cristóbal Valenzuela (19:54):
It's a combination of that. You have a language model that able to semantically translate with your intentions into a picture or a video. And the thing that we're working on, and I think other folks are also working on it, is try to improving the accuracy of the descriptions of year one and how those are represented in the image that you get. Because if I write a red car driving in the street, we all will have different mental models and ideas of what that red car driving down the street looks like. And so that's not a solve of a problem because it depends on taste and aesthetics and ideas and how you want to see that car driving through the street. And so coming up with a model that can understand your kind direction and maybe at some point can also be trained on your taste and your intuition as well, so it can work almost Ana as an extension of yourself.
And that's where I go with creative augmentation. Augmenting your creative capacity by having some sort of system that perhaps might think in a very similar way that you do and then allows you to create stuff like this. But at a technical level, yeah, it's a mixture of models that will work in different domains including language models on using transformers. Most of these days are just built on top of that. And then video and image models, there's a lot of explorations as well of moving beyond diffusion to transformer based architectures as well that just provide much more control and quality overall.
Rajat Suri (21:14):
I think you spoke really powerfully about the impact that camera had and created all sorts of new opportunities for artists and creative people, and of course that's happened probably for the video camera as well and other forms of technology over the years. iPhone probably has unlocked a whole bunch of creativity as well. What do you think it looked like, I mean 10 years from now with the power of Runway, assuming Runway and this type of technology is working in the way that you hope it will work in 10 years from now, it's working as smoothly. What do you think society looks like? What do you think are the fundamental changes? I mean, just to kind maybe let's dive a little bit deeper into that. The artist lifestyle, the creative lifestyle. Every person can be a director or an editor or are we going to be making movies for fun in our spare time? I'm curious what you think of that future.
Cristóbal Valenzuela (22:02):
We have a saying in Runway that I think perfectly encapsulates this idea, which is the best movies are yet to be made, the best stories are yet to be told. We think we've seen the golden era of cinema. I think we're not even close to the golden era that we'll start to see in the next couple of years. Most of the stories we've been seeing out there are coming from a very, very, very extremely very small set of people. And so there are people around the world that are using Runway to tell their stories in ways that they couldn't before. Almost every day I got emails from people who are thanking us for creating a tool that helped them tell a story they wanted to tell, but they never had the means to. And that's actually the most interesting and impactful way that disruptive technology impacts people is that you make things more convenient.
Filmmaking is a very extremely hard and expensive thing to do, even for production houses and great creative teams. I speak a lot with producers of major production companies and streaming services and artists that have won academy awards and sometimes just coming up with visualizing their ideas takes them years. And so the moment you can take the cost of content and creating those ideas down to zero, then quantity of ideas will go exponentially to the roof, which is great. We're going to start seeing things that we've never even thought of before because new people are going to be able to express those. And so it's a very exciting time in the sense of there's going to be a creative explosion that we perhaps have never experienced before and we're going to start hearing from people and ideas and stories that we have never even heard of before, which is for me a great thing to think about.
Immad Akhund (23:48):
Yeah, super exciting. In that world, do you imagine there won't be actors and actresses right now? If I think about making a film, it's quite complicated, right? I have to think of a script, I have to find the people, I have to find a location potentially. You still have to think of the script maybe actually that could be augmented as well, but you're cutting out a bunch of pieces in between, right? Do you imagine that we just won't need those pieces in between or is there another world where we still have actors and actresses and locations?
Cristóbal Valenzuela (24:18):
I'll go back to the camera analogy because at the time the camera was developed, the same type of questions were being asked. What happens with models that are posing for paintings? Are they going to disappear? Are they going to be out of a job? And the truth is that the change VAVO of being photogenic for example, was only born because of a camera, and that created an entire industry and that created an assumption of a type of person and job that just didn't existed before.
Immad Akhund (24:45):
That industry was a lot bigger.
Cristóbal Valenzuela (24:47):
Of course, now we give it as a norm. It's like a law, that's how things work, but it's not, it was completely invented, it was completely made up and it worked really well. Those professions and those professionals of the future might not even be called actors. They might be something else. It might be a variation of it, it might be an evolution of what we think of an actor these days. It's hard to predict and entirely describe. We'll still have some sort of acting and performance, but perhaps it'll be very different from how we understand it today.
Immad Akhund (25:19):
I guess I have one more potential downside. I'm sure people ask you about it. Presumably deep fakes will become a lot easier. I don't think we are there yet, but there'll be some point where you can make a video of me saying whatever you want me to say, and it'll look just as convincing as this video. I guess two questions. A, is that avoidable? And then B, how far away do you think it is?
Cristóbal Valenzuela (25:41):
I mean, we're not far that's happening right now. You can create avatars and D, the likeness of anyone using models. The consistency of those and the angles and the camera shapes and forms that you can take are still limited. But for me, that represents only a small percentage of what you can do with this, right? So there's a thousand things that video and journey models will allow you to do and are allowing you to do. One of them is this idea of creating deep fakes, which also has a great potential and can also have a very impactful misuse using their own way. But that's where I go. And again, some of these same questions were asked many years ago when cameras became ubiquitous where everyone had a camera where people are like, we should stop cameras because privacy will be completely turned away from people.
You're going to be able to record anything and maybe you don't want to be recorded. And that still is the case, but there were norms and rules and now it's illegal to record someone without asking them, for example. And so you've come up with social norms about how to make sense of that technology and prevent misuses. And this is, I would say something very similar is like you can do a thousand things with these models. Maybe one of them will be really bad. We should make sure that we build norms and systems and relations to make sure those things are in place. So if people want to do bad stuff, we can prevent them. But the overall output of everything else, it's just net positive.
Rajat Suri (27:06):
Do you think they will be detectable or do you think DeepFakes will be too convincing?
Cristóbal Valenzuela (27:11):
There are mechanisms of, there's watermarks detail watermarks and encrypted watermarks within Runway right now. We have a physical, a visible watermark that you can watch and see, and we also have cryptographic watermarking that you can use to the tag if a video is generated or not. And so there's definitely that. But I would say most importantly, the watermark that we need to build as a society is the acknowledgement that this is possible. And so the moment you start becoming more exposed and being able to see and being able to recognize that content can be generated, you start assimilating it and becoming more powerful your normal life. So if you watch a movie, if you go and watch, I dunno, Jurassic Park or the Avengers, that's not a real movie. That's not actually how the world works. People are not flying and dinosaurs and not walking through the streets. Now if you watch that movie a hundred years ago, people were freaked out. People will be like, whoa, there are dinosaurs in the streets. Because there weren't an acknowledgement and exposure to the technology enough where you're like, okay, I get it. It's like some stuff can be fictional and I like it. I like it that it's fictional.
Rajat Suri (28:21):
Even today you can pretty much generate a video of anything. The cost is pretty high. Yeah,
Cristóbal Valenzuela (28:26):
That's exactly right. I mean, what I find interesting is that people think about generative stuff as this deceptive or very fictional imaginary worlds. The movie, the Social Network by David Fincher has I think more CGI than science fiction movies these days. You just can't tell it's so good, so well taught, so well created. If you have enough skills, you can do it and it's happening. And sometimes you don't even realize,
Rajat Suri (28:51):
Cris, what is the current state of the technology? What is it really being used for today on a day-to-day basis? What's the number one use case? I've been to the website, I've seen it. You can generate these small clips, I believe from prompts. Is it being mostly used in Hollywood? You mentioned there are people around the world using it for things. We'd love to dive into technology as it is today, and then also have some follow ups based on what are the challenges to overcome to getting it to realize this beautiful vision of yours.
Cristóbal Valenzuela (29:17):
On the first question, we're early, very early stages still. Gen two was the first commercial model and perhaps the first ever model that was able to instill is able to generate long sequences of video up to 18 seconds. And it's already been used by millions of people who are creating everything from future films. There's someone who made an hour and a half movie with it and short films as well and everything in between. And so it's used a lot for video final outputs or video content and also for pre-production. So visualizing for storyboarding, for everything that happens before you actually make the final output, even though it's very early, you start seeing adoption in both spectrums. I think where in the analogy of the camera, it's like we're still in the black and white faces of the camera, no sound kind of like early, it doesn't look exactly like fully there yet, but it's advancing really, really rapidly.
I mean, if I show you our research five years ago on video generation and where we're right now, it's just magnitudes of times better and it's only going to get better from here on. And so the challenge right now, I guess on the second question around cost and what's required, it's a big investment. This is a really hard problem to solve. You're solving video, which is a very hard and content, which is a very hard thing to solve. And so compute from the one end for sure to train large models becomes incredibly important. But most importantly, I would say the biggest challenge is just I would say having the right knowledge and vision to do this for the next 10, 20, 30 years every month. It's not about a model or a singular dataset or a single architecture, it's about figuring out how to do that every week, every month, every year relentless, just inventing and inventing and innovating and innovating and keeping that pace of innovation at the core of what we do. I think that's more interesting and challenging than compute and research researches and everything else that for sure are hard, but you can solve, you have the right people.
Rajat Suri (31:19):
Have you thought about using this incredible technology you're developing to generate your own content as well, similar to a little bit like Pixar did back when it developed this new generation of animation, computer generate animation. It seems to me like you, by having Runway studios may make a lot of sense and you're just turning out a lot of content which show the power of what you're doing as well as allow you to one day license it to others.
Cristóbal Valenzuela (31:43):
Yeah, exactly. And that's what we have actually Runway Studios is we have a division of Runway that creates, produces and partners with other creatives to make content. We've been heavily inspired by the Pixar story and what they were able to do. I mean, Pixar for us is a great example of merging both. As I was saying before the beginning, like art and science, you have exceptional artists working with the frontiers of technology, coming up with tools that no one thought were even imaginable, for sure. Animation now becomes an obvious thing. But if you see the first ever picture demos from the eighties and nineties, most people thought it was a toy. Most people thought it was just a quick interesting thing, but no one took it seriously until it became perhaps one of the most relevant forms of storytelling these days. And so for us, it's very similar and that comes from the space of knowing how to again, build a team, build the right culture and the right philosophy of product that merges both storytelling and science. But yeah, Pixar for us is being a great example of where we see the company going in the next couple of years as well.
Immad Akhund (32:49):
What got me really excited about having you on this podcast as when Amjad said that everything Everywhere All at once, which I think is an amazing movie, used your tech. So I was wondering how you go from eight second videos to I guess a full movie. You said it was used to make a one and a half hour movie. And then I guess related to that, how did everything every roll at once use your tech?
Cristóbal Valenzuela (33:14):
Yeah, the story about everything, every world at once, it's fascinating. I dunno if you guys have watched the movie or not. I would say one of my favorite movies, it's beautiful. I watched the movie when it first came out. It was very much under the radar, like small cinemas. I watched it in New York and cinema and Bushwick before it came World Sensation. And when I watched it, it just became obsessed with it and started to go deeper into who made it and the visual effects are behind it. It didn't come across a post from one of the directors, the Daniels who were saying that the editing team was just compromised of seven people. When I heard that, it was like, that can be true. Seven people doesn't make any sense. I was expecting hundreds of thousands of people maybe, and no, it was just seven people, I think seven and a half.
You have a freelancer, perhaps eight at some point, but it was seven people who were both doing the editing and the visual effects. And that just blew my mind. These guys are just, first of all, incredibly talented, incredibly great artists, but also insane. How do you make a movie that complex with seven people? I searched for their, who were the people behind it? And then I went into Runway and searched for them in our user registry, and most of them have accounts with Runway, which was unexpected. So reach out to them and I spoke with them and it happened to be that they actually used some of our tools for a few scenes in the movie because they were trying to find ways of automating and simplifying the editing process. And so that for me represents a great example and a great taste of what's to come in the filmmaking process where teams of very small, incredibly creative, incredibly talented people that don't have to have hundreds of millions of dollars in resources or a lot of industry experience and connections can just execute an idea with the right tools to the point where you can win seven Academy Awards, which is what the movie did.
And so for me, more than a beautiful movie and an incredible team of creatives, I think that movie also represents a signal of what's to come small teams, great tools, making content that will win awards like Academy Awards. I've been looking out for the next set of grades that will do something like that.
Immad Akhund (35:30):
So let's say you have a scene, well, there's supposed to be a, I don't think I'm giving anything too much in the movie, but there's supposed to be a floating bagel thing. Normally you do CGI tunes at that floating bagel and that scene is like whatever, two minutes long. How does the creator go from this kind of empty scene, no bagel to bagel using Runway? Is it like they do it eight seconds at a time? Is that kind of the idea?
Cristóbal Valenzuela (35:55):
It depends. Creating a film or creating anything, it's a process. And so we actually, it's funny enough, we release a set of, we call them footage package, and it's AI generated video that you can generate anything. So we're just showing you how to generate a few things and there's one pack of content that's called floating bags. We're kind of like, it's a joke, but it's actually true. It's just plastic bags floating in the air, just like camera angles of floating plastic bags. And the way you create those is you can do one of two things. You can just type in, I want to see a floating plastic bag in the air and in the specific style or a specific angle or direction, and then the model will render and create it for you. The other one is you might actually take a picture, for example, of a bag or of a bagel, let's say for anything you want, and you upload that picture into Runway and then tell the model, animate this, make it move to the left.
You can move to the right to the top. I want to see a view from I want to zoom out or I want to zoom in, or I want to move the camera to the left. And so you can do those kind of things. A couple of weeks ago released this new feature and Runway called director mode, where you can basically take a model and move the camera angles in any way you want. This is all generated, you're just generating every view as you continue to go. And that's where I go is depending on the creative idea that you have as a filmmaker, in this case, the bagel, you can start from either text or an image or a combination of both. It's very free and very empowering to do so. But yeah, you can use a combination of different things like that.
Immad Akhund (37:27):
Oh, that's cool. What is the limiting factor here? Is it GPU power? Do we need order of magnitude more GPUs or is it algorithms or is it a bit of both to go from eight seconds to 80 seconds, let's say?
Cristóbal Valenzuela (37:41):
A bit of everything, although I would say that generating a movie is not about creating or generating a story of short film. It's not about generating 80 seconds of continuous video. If you look, the history of filmmaking has gone through average durations of minutes to, I think the average scene duration these days, four and a half seconds, which means that when you're watching a film, you're just watching scenes stitched together that on average are four and a half seconds before scenes used to be way longer because of a technical constraint, which is the roles of actual physical footage had a time limit or amount of frames that you can shoot. So directors was like, okay, we have one shot. We need to really use this shoe source really well and we're just going to shoot a continuous shot. Now you don't care because it's all digital, so you can just shoot whatever you want, which changes the nature of how you stitch them together.
And so for us, it's less about how do we make sure we can generate 80 seconds of one singular shot that you will, and you can do 20 seconds now, but more about how you have consistency of the objects and the characters as you change the scene. If I'm recording you, for example, walking in the street, I might want a shot from your face and maybe then in the next four seconds, I want a shot from, I dunno, from the top, and then I want to see your shoes and I combine those together to tell a much one promising story. Now the challenge is consistency of those scenes every time you change the camera.
Immad Akhund (39:09):
And is that an algorithmic challenge, right? Making sure the algorithms can understand the context between the scenes.
Cristóbal Valenzuela (39:16):
It's exactly right, that's the right word. Language models have context window problems. It's like how much can you remember of what I've said before? This is somehow, think about it very similarly, it's like how do you make sure that the style continues to be persistent as you change the camera and the style?
Rajat Suri (39:31):
What's the business plan to some degree? How do you monetize? How do you think eventually you're have to fund, you're going to have to be profitable, right? We talked about Pixar earlier. Pixar obviously had a challenging first decade. I mean, they almost went bankrupt and that's why they needed the 5 million or whatever Steve Jobs put in and made him a majority owner or a big owner. They almost went bankrupt before they launched Toy Story. So how do you make sure that you can fund this multi-decade long project and remain profitable?
Cristóbal Valenzuela (40:01):
The one thing to always keep in mind, to make sure you're milking things that actually people want is you need to work really closely with your users. For us, that's the philosophy of Runway from the very beginning is we're building tools for humans, for human creativity, then let's work as close as we can with those people. With the filmmakers of the world, with the artists of the world, the obsession of models and technology that we've seen recently with this most recent AI boom has shown us there. Sometimes we obsess too much around performance and models, but don't obsess too much about how people are going to use those models. And so for us, a common and constant reminder is to think about use cases. And that goes back to business models. We have millions of users that range from award-winning filmmakers to TikTok creators just searching TikTok for Runway or gen two, and you see a bunch of stuff being created.
It's so liberating and so great to be able to create content in this form that people are now paying for it. And it's a subscription system. It's a very simple SaaS. You get a unlimited credits or limited generations, and therefore companies, companies care less about ai. AI is I think a great story, but inside a company where you care about is just driving the cost down. If you're a production company or a studio, you want to make sure that executing your ideas as effective and efficient as possible. And so for them, it's about making sure that iteration and creation happens faster than ever before. And we have plans and offerings that tackle and help them do exactly that. So our goal is always to come back to that. We're building some of the most impactful and fundamental research in the space. We've invented a whole category that never existed before, but that's not enough. It's enough the moment you start really obsessing with how people are going to be using those, and that's what we do every day.
Immad Akhund (41:48):
You raised, I think more than 150 million. Do you feel pressure from VCs? Obviously you have a very creative vision. I think they are thoughtful, long-term thinking VCs, but they obviously want returns at some point. Do you feel any pressure like that or have you found that it's easy to focus on the kind of vision?
Cristóbal Valenzuela (42:08):
Yeah, I mean I think begin the right investor really matters out there. They're really bad investors and they're really great investors. And picking who do you want to work with is a very important thing because if you're not aligned on the vision, and again, if I show you a video of what we had five years ago and if VC might tell me, great, do something else, I'll be like, no, we're doing this and we're not stopping until we get it to work. And now you see the results of Gen two and you're like, oh, of course that makes sense. But if someone was telling us five years ago, don't, it's not going to work. And so we've chosen and we've been fortunate enough to just be able to work with the people. Really, we want to work and I believe in the mission long, long-term like this is a company. We're building a generational company, we're building something, we're inventing a category. We want to make sure we continue on that path by making big, bold and long-term bets rather than short-term specific local premium bets that Mars allowed just peak pure investors really rightly. And I think we have,
Immad Akhund (43:06):
I guess one last question from me, this is not Runway related. When you look at AI and creativity in general, what do you think is underrated? What are people not looking at that will impress them or blow their minds? Or do you think it's all a little over hyped?
Cristóbal Valenzuela (43:23):
There's definitely a lot of hype that's kind of obvious these days. I think people are obsessing around models and research for the sake of research, and I think that's a huge mistake. That's the moment you miss the point of why it's going to be useful. And so the thing that's underlooked is the products and the people using these models, and so I'm not a huge fan of those rankings and benchmarks of people finding arbitrary measurements of models to just rank them first or second on a third. That's great for engineering purposes and for research scientists, but I think we're transitioning from research to product and impact, and people need to focus way more on use cases rather than every fun senior thing that comes around. It can be eye catching and it's great, but long-term, I think always focus on value.
Immad Akhund (44:14):
Cris, this was awesome. Really appreciate you spending the time. Going through the vision and what you're building is just super impressive, so excited to see the future unfold.
Cristóbal Valenzuela (44:25):
Yeah, of course. Thank you for having me here. It was a great chat, Cris.
Rajat Suri (44:28):
Very inspirational. We're cheering for you to win because we all could use these tools. Thank you.
Share this post