Talk: Using generative AI for planning and action in robotics

Generative AI like ChatGPT has revolutionized human-machine interaction – producing text, images, and video with remarkable fluency. But what would a similar breakthrough look like for embodied systems like robots? Is it simply giving robots a ChatGPT-style interface, or could there be a deeper transformation in the offing?

This talk explores the surprising structure hidden in the latent spaces in which some of these models learn, and how optimising within them can lead to creative and useful behaviour – even in the physical world. From controlling real robots to modelling the environments they move through, we will see how generative models can be used to simulate and plan in imagination. And as these world models become more structured, they hint at a broader possibility: AI systems that allow us to derive learning from observations, that help us to explore and understand the world – and not just automate it.

Join Professor Ingmar Posner, lead of the Applied Artificial Intelligence Lab at Oxford University and a Founding Director of the Oxford Robotics Institute. His research aims to enable machines to robustly act and interact in the real world – for, with, and alongside humans.

The Royal Society is a Fellowship of many of the world’s most eminent scientists and is the oldest scientific academy in continuous existence.

▶https://royalsociety.org/

🔔Subscribe to our channel for exciting science videos and live events, many hosted by Brian Cox, our Professor for Public Engagement: https://bit.ly/3fQIFXB

We’re also on Twitter ▶ https://twitter.com/royalsociety
Facebook ▶ https://www.facebook.com/theroyalsociety/
Instagram ▶ https://www.instagram.com/theroyalsociety/
And LinkedIn ▶ https://www.linkedin.com/company/the-royal-society

Hello everyone. On behalf of the Royal Society, it’s my pleasure to welcome you to this event as part of the summer science exhibition 2025. Just a couple of notes before starting. Some upkeeping information. There is no fire alarm planned today. So in case of uh an emergency, G4 exit is through the door you came through. Uh if you would like to tweet about this event or in general the exhibition, please use summercience and we are at Royal Society. I would also like to remind you that this event has been livereamed and is available online on YouTube and will be available after the event. So please don’t record this uh talk and also please keep your phones on silent. Um at the end of the talk there will be a section for Q&A. So we’ll be going around with microphones and also if you have questions throughout the event raise your hand and we will join you with a microphone. Um now it’s finally time for the talk. I’m delighted to introduce professor Ingmar Pausner, lead of apply artificial intelligence lab at of Oxford University and a founding director of the Oxford Robotics Institute who joins us to explore the application of artificial intelligence in the embodied systems like robots. Please give Professor Fosner a round of applause. Hi everyone. Uh it’s a real pleasure to be here. Thank you very much for having me. Um, for the next sort of 20 minutes or so, I would love to have a conversation with you about generative AI uh and how it helps with planning and actions in in robotics. Um, I also I tend to like conversations rather than me just talking at you. So, uh, we can have questions at the end, but also feel free to just ask stuff though. I’m told that somebody’s going to come sprinting with a microphone, which is great. Good. Okay. Good. So, um, I would really like to talk about a bunch of things. planning, action, robotics, uh, machine learning. Um, I had a very quick conversation earlier about sort of reinforcement learning, learning by trial and error. We’re going to talk less about that because what I really would like to get to is peel away the skin of a very basic generative model and look under the hood and ideally set you on fire in a good way about the structure that these systems learn and how that might help real world systems to act in the future and already now. Okay. So before we get to that though, let me ask this. Um, so who here uses uh tools like chat GPT? Excellent. Good. Who here uses tools like Gemini? Yeah, still good people. Excellent. Good. I like I like asking about the comparison. Um, anybody here work with robots? Okay. Uh, does anybody here like playing games? Excellent. Good. All right. So, let’s start let’s start with a game. Who here knows tic-tac-toe or notes and crosses? Good. All right. So, let’s talk about why why uh planning might be a difficult sort of thing. So, we kind of all know how tic-tac-toe works, right? Uh it’s a fairly simple game. It sort of has, you know, nine cells. We place a zero or a cross in it. Um and whoever gets three in a row sort of wins, right? Um let’s think about how we might plan a good strategy for this, right? So, in the beginning, the field is empty, right? Somebody makes a move. So I can pick basically one of nine positions. Somebody makes a move. Uh I’ve picked my my one. There basically eight left that I can pick from. Interestingly though, there’s sort of nine in that second row because there nine possibilities that I have to cater for. Right after that I can basically choose from eight and after that I can sort of choose from from seven and so on and I can do that all the way down to the sort of kith turn and typically you know that would be maybe nine if we play all the way to the end. Uh it turns out that sort of that that tree structure there that you see is is called a game tree. And if I trace a path through that particular tree, ultimately that’s one entire game that we’ve played. And if I wanted to think about how many sort of games there are to play uh that I can possibly play in Tic Tac toe, there’s about 250,000 legal ways to play this game. On average, there’s about four legal moves that I can make. And if I look at sort of the leaf nodes, the bottom nodes on that tree, uh how many might I have to look at in order to figure out whether I stand a good chance of winning ultimately what sort of a good strategy might be? Well, ultimately that would be to sort of for the sort of power K of that um of that game tree, which is the depth of the tree. Okay, so that’s how we get to roughly the the 250,000. And if you ask what’s a good AI strategy to do this, it’s actually called a lookup table because we can compute the entire thing. We can stick it into computer’s memory and we can go actually whatever the game state is right now we can just look up what to do next. Okay, interesting thing about this is first of all there are rules there only a set number of things we can do. The second thing is the game state is exactly known right which is so game state is uh where somebody’s put notes where somebody’s put crosses okay well if you can just put this stuff into memory why is it why is it sort of why we had all this excitement about you know chess and go and and so on and it turns out that if we actually open up the same conversation about chess uh we end up with many many more things you need to look at so of the order of sort of 10 to the 124 okay so for reference there’s about 10 to the 80 uh atoms estimated to be in the visible universe. Okay, if each one of those atoms was a really really powerful computer, you know, like a a 4 GHz Pentium, whatever CPU, uh it would still take multiple universe lifetimes to compute all of those things, right? This is why playing chess for a machine, I’m going to say used to be used to be really hard. Okay, it’s less hard now. And of course if we then open it up to go which for a long time after chess was sort of the big the big piece to uh to beat those numbers get even more crazy like 10 to the 360 right I mean those numbers are absolutely staggering okay so we need a better way of doing this and of course um with Alph Go uh and sort of the alpha series of models deep miners done a fantastic job in in um addressing a lot of these um actually based on sort of forward looking and modeling what the game actually looks like because we know what the states are we know what the rules are We know what kind of things we can do with this, but that’s a different story. We want to talk about robots, right? We would like to talk about robots in the real world. So to get there, let’s now imagine a different game. Let’s imagine a game that has no rules. Okay? And a game that has an infinite many like infinitely many states. Okay? There are infinitely many things that you can do. And that game is called making a cup of tea. All right? Because ultimately if you think about how you make a cup of tea, you you somehow have a goal in mind. You want your cup of tea and somehow you happen to know that it requires heating water, that it requires, you know, a tea bag or loose tea uh and that it requires milk maybe uh stirring and so on. But actually the problem is more fundamental than that because all all your brain has got to go with initially at the lowest level is I can sort of move my limbs somehow and somehow you have to get from the signal of moving your limbs to hey here’s a cup of tea. Okay. And very often there are infinitely many things to do but the use the signal of what is useful is very very sparse. It’s very very rare. Yet we managed to do quite a lot of cool stuff. Okay. And so there are many ways of doing this. Um so many of the standards of robotics control engineers uh would would look at what I’m about to tell you and go um because control engineering helps a lot. Often this is about building models of systems and we will also talk about building models or learning models of systems that we can then exploit because we can forward simulate effectively what should happen and then we can plan actions that sort of get us closer to a goal and so on and so on. Another way of doing this is to say well we don’t use any models and we learn this entire thing by trial and error and in the community we call this reinforcement learning. Okay. Problem with that is that if you do that in the real world, it kind of gets tricky, right? Because um who here but knows how to ride a bike? Yeah. Okay. So, I doubt Okay. Who here when they learned how to ride a bike came up with a physical model of the bike, figured out how they had to use their legs and how to balance and that sort of stuff, uh and then implemented that and oh my god, it worked and everyone was happy. Okay, one person. Very good. uh instead who here got on the bike, fell off multiple times and eventually just got it hopefully. Yeah. All right. Okay. So that is basically the equivalent of learning by trial and error. Reinforcement learning. Okay. Problem is things in the real world go wrong. And also getting on that on your bike lots of times takes time. And so there’s a different world where we’re saying reinforcement learning is great but we would like that in for example in a simulator in simulations where when things go wrong we don’t really care and we can do this at many times real time. Okay. That also as it turns out is a version of a model. The problem is some things are really difficult to model like for example contactrich interactions right when I pick up something that is squishy writing down a mathematical model for that is is is really quite hard. And it turns out maybe we would like to learn that instead. And we’ll talk about that in a little while. But the upshot from this really is that learning is incredibly useful here particularly because models are incredibly useful for planning and action. Okay. So this is meant to be a uh you know an uplifting story uh of about generative AI. So I thought I’d you know ask my my favorite friend um about what that should look like. Uh many many many stories can be told about this um in both flavors. You know without a doubt we need to be careful about how we use this technology. This is really about sort of the uplifting piece about how that enables machines and and sort of agents to operate in the real world. And often in the community we really talk about you know we’ve had this chat GPT moment where really interesting stuff happened. Where is that chat GPT moment for robotics? That is kind of what many people and many companies are chasing right now. Interesting thing is chat GBT was sort of uh uh and any of the GPT models were basically trained on internet scale data. Where does the data come from for robotics? don’t really have that amount of data per robotics even though some people are also working on that which is great because we need sort of all strands. So what do people do? Well I mean the obvious choice would be hey let’s just take you know a GPT model or GPT style model and strap it to a robot. That sounds like an eminently good idea and it kind of is because you know there’s some really interesting things that can happen. So uh one way of doing this is or one one thing that enables us to do is sort of language natural language um interactions right. So um we can use it to map language commands to individual skills. So imagine you’ve trained a robot how to pick something up and it puts something down. We can then say okay well if I say pick something up it suddenly does that. It’s a bit like a skill that you develop for your uh home assistant like your Alexa skill or something right. Um yeah we can do that and that’s that’s sort of interesting because it enables more natural interactions and that can be useful for example in context like social care and so on. Uh which is nice. Um the other thing that this sort of technology provides is what we call contextual AI or contextual information. So if I say pick up this green thing then suddenly this green and thing has a meaning. If I say put them close together I don’t have to rule I don’t have to write a rule about what close together means it just goes oh yeah so from the data I’ve distilled kind of spatially what that should mean. Okay. And that is that is sort of super useful. Um, finally, the other thing that you can do is you can actually ask, and you should try this, ask a GPT model to plan something for you, like your favorite holiday or how to get from A to B or any of those things, how to build something. And a lot of the time it’ll give you a breakdown of the steps to do. And a lot of the time, this will be tantalizingly close to something where you go, this is really brilliant. I’ve just learned something. This is great. But every now and then, it’s going to go wrong. And the problem is we don’t know. And certainly a robot doesn’t know when it’s gone wrong in that sense, right? So it can just take the take the description of how to do some stuff, implement it, you know, uh smash the cup of tea against the wall and go there is uh and it will never know that actually maybe that part it shouldn’t have done. Okay. Um but we’re working on it. So bear with us. So there are kind of a couple of different ways of doing this. We can either map individual language commands to to sort of particular skill sets that the the robot has learned or we can just treat actions like language which brings us back to okay well if I want to solve this particular problem do this and you know the sequence of actions is sort of a language right and so we predict that just what like we would do language there’s a lot of work going on in that space but fundamentally the problem remains that we we don’t have internet scale uh data for robotics luckily there’s some other things that we can do so let’s talk about you know a very very basic generative model. Um okay so whenever you see a a picture like this and I apologize for the contrast um we can just about see it. Think about that as a learned model like a neural network model type of thing. Okay. Uh and um often what we do in the community is we you know we start when we start thinking about something or when we explain stuff we work with cats. So let’s let’s follow that and let’s work with cats. So imagine that you’ve got a neural network model uh and you you basically give it the task to reconstruct its input. So on this side over here you’ve got some some some image input coming in image of a cat and over here the task is please recreate that entire picture of a cat. Okay. And now what we’re going to do is we’re going to split this one model into actually two models with a funny thing in between. And those two models work independently from one another even though they’re trained together. And the first one basically says, you know what, take that model of take that picture of a cat and compress it into a bunch of numbers, a small comparatively a smaller number set of numbers. Okay? And you can think of that as the summary of the cat image that keeps hold of all the important information. So we can take the second network, shove that piece of that set of numbers back in and reconstruct that cat image. Okay? So we’re just going to make that as a design choice. And this bit in the middle, we we have a name for it. We call it the latent embedding. Latent because we don’t really know what it is. Like we don’t know, we haven’t observed it. Uh embedding because it sort of sits in some embedding space. It sort of spans a coordinate frame with many many many numbers. Okay. And now I can put this on steroids, right? I can learn I can take all the pictures of cats or really big number of cat pictures. I can shove that through that system and instead of one set of number one one set of numbers, I get a set of numbers for every picture. And I can, you know, basically draw a distribution of that or draw a histogram of that which gives me a distribution which is this guy down here which basically says this is the distribution of all of the cat pictures and what the numbers should be. And that is useful because I can take that and I can throw the dice and call it sampling from that distribution that gives me some random set of numbers and I can throw away the first part of the network and only use the second part of the network. shove that through it and boom, out pops a picture of a cat. Okay? And fundamentally that is kind of how that works. And it’s a very very powerful thing even I mean this was done like 10 12 years ago. Uh and it’s based on on uh theory that’s even older than that but it’s fundamentally a very very powerful way of doing things. And the guys it was developed by a bunch of people at the same time but but one of the guys one of the guys who did this um effectively uh ran a particular experiment. They said instead of cats we’re going to work with another famous data set in machine learning which is the emnest data set. It’s literally a data set of individual pictures of handwritten digits. And it was originally designed for things like um postcode um recognition and that sort of stuff. So they shoved that through through the system um and they said you know my little compression piece in the middle. I’m not going to make it 15 numbers. I’m just going to make it two. And I’m going to make it two numbers because if I only make it two numbers I can basically think of that as a grid, right? I can display it in 2D and I can then say okay well at every point in this grid I can just decode it into another picture of a digit and we can just see what happens. Let’s just see what what kind of structure this thing is called and what happens is sort of is is this and there’s a bunch of interesting things to note about this. The first one is this is not random. Okay, there’s clearly some structure here. So there’s some structure here that basically says from the top left of a six we can sort of morph to the bottom right to the bottom left which is sort of a a seven in in a very smooth and continuous way right and then it goes itallic from one side to leaning italic to the other side and so on right so there’s definitely some sort of smooth transitions and the key point here really is that there’s some structure the other interesting thing is every point in this grid I can decode into another number right so if I can literally take a walk in this latent embedding from the bottom left to the bottom right and I say every point I and get another number out. I get these sort of numbers that I see on the right hand side here. And so you can think of that literally as taking a walk through that latent embedding. But the question is where should we go? Right? I mean there’s a big space. We can walk around anywhere. So there’s one other piece and now we’re going to get back to the robotics part. But there’s one other piece of architecture that we need which is this piece at the bottom here which is literally another neural network. Um it’s we call it a classifier that basically asks the question. It said for example did it work or did we bump into stuff? Right? And we can run this forward and say, well, if we run it on this on these on these vectors in the middle, we can say, well, did this work? Uh, and the output might be no. But we can also run it backward. We can say, you know what, fine. Where we are right now, it might not have worked, but I would like to walk backwards, please. I would like to work backwards and walk around this latent space in such a way that the answer becomes true. And that gives us direction as to where we can be in that space. Okay. So imagine that you have one of these sort of reaching tasks that you get in in the cognitive sciences. They’re like running these experiments where you try to reach a place. Um you’re not allowed to sort of reach into this red arena. You uh you have a basic tool. You want to reach the yellow uh the green goal. Um and you’re not allowed to bump into anything that is blue, any of the obstacles. Okay. Right. What we can do is we can basically encode images of the arena and images of tools silhouettes uh and images of the tools themselves exactly in the same way that we encoded the cats and then we have a yes or no signal that says for every tool that I show you did this work for this particular random setup. Okay. And by doing so, we basically come up with a system that says, well, if we go back from this answer into this into this latent embedding and we walk around in it and we decode everything along the way, we could say, well, actually that gives me a changing tool, right? Just like the numbers changed as I took a walk through lat space. And when we run this forward, some interesting stuff happens. And this is based on the exactly the same idea as the numbers with exactly the same structures. So we see a system that deliberately or seemingly deliberately starts to modify uh properties of the tool in a way that it hasn’t really seen them before being modified even though it might have seen the final tool shapes in the in the system. And this is basically structure that you get uh or structure that you’re exploiting from the training process very much as before. And if you squint really hard at this sort of problem set you think well in a way that’s kind of a path planning problem. You’re planning a path from the robot to the goal, right? And that might make you think, well, why not just use it for path planning? We can do the same thing, right? We can basically start with a robot um where we have the robot po uh the joint um encodings uh and the endector pose and we basically do exactly the same thing. It’s just that we have two of these guys now. One says, did we collide with anything? And the other one says, are we there yet? Because you want to reach somewhere, right? And again, we train the system in exactly the same way. We run it forward and it turns out it works which is pretty mind-blowing given the simplicity of it all and typically that you would like to you know that that you would spend a lot more time actually engineering this in a um sort of more rigorous sort of way. So this is purely driven by the structure that we find underneath. So then we might ask well okay what happens if we try this on a more complex machine I wonder okay um and this is what my students call a crazy idea. Um turns out it did work. Uh so here the idea is well what what if we take this this uh um quadriped quadriped robot and we run exactly the same thing but now we don’t just take individual poses of the robots but we predict we we take little trajectories of the robot. So this is the this is time now. This is the past. This bit here is the future. So we predict into the future. But other than that we do like very very similar sort of stuff. And the embedding here might be pretty large. But let’s just take a slice through the entire thing a 2D slice just like we did with the numbers and see what kind of structure we find there. And just like with the 2D numbers we can sort of plot this like this. And we see some dots. Okay. Uh we see some colorcoded dots. And we can ask ourselves what these color codings mean. And remember if each and every single one of these dots I can decode into quadruped robot poses. Okay, so that’s one thing that we see. So each one of those we can decode back into quadriped poses. And if we look at what these poses do, we actually find that they encode sort of a if we follow this around sort of a gate cycle, right? So in one dimen dimension it’s sort of an increasing swing length of of the uh the swing of the foot. uh in the other direction there might be an increased swing height and all of a sudden we have this really complex system with lots of degrees of freedom that we can effectively control with two signals. Okay, just because of the structure that the system has learned. And if we take a a walk in this in this latent space or in this latent embedding, some really interesting stuff happens because typically when quadriped guys get robots like this to walk, they have hierarchical controllers and they might have done a lot of maths and all that sort of stuff, which to be fair is the right thing to do because it’s really robust and really great and they’ve done some great stuff. But it turns out that with a very simple system like this, just based on the structure of the problem that has been uncovered by this generative model, we’re actually able to get a robot to to walk relatively robustly, which is pretty pretty mind-blowing. And so often people ask about, well, you know, is this how humans do it? And the answer is, I really don’t know. I don’t think anybody really does. Okay. But the question is, is there any sort of tentalizing um evidence that we can point to where we go, well, this this kind of starting to look similar, right? So, if we take exactly the system that I’ve just shown you and we can repeat the same thing with arms and reaching tasks, uh where these are the robots and these are sort of the latent embeddings and this is what we call limit cycle or the the circles that we the tracks that we make to get these robots to do stuff. Okay. Then it turns out that you know about 15 years ago uh some scientists some neuroscientists looked at what this looks like in monkeys in primates uh where they basically looked at the motor cortex and looked at all the popular like the neuron firings that they uh could identify and they projected it down into a particular um two dimensional sphere and they found very similar sort of patterns emerge right um where does that leave us I’m actually not sure I don’t know But it’s interesting that to see that we kind of see these sort of patterns emerge in a very similar sort of way as they seem to emerge in certainly um some primate brains. Okay, so all of these are models of robots which is kind of nice. Um but what about the rest of the world? So we already talked about cycling, right? Um who here has cycled in continental Europe? Excellent. Good. Could you all of you close your eyes and imagine yourself cycling in continental Europe? So you come up to a roundabout. Yeah. And suddenly everything’s sort of the other way around. You you’re on the other side of the road. Cars drive around the other side the other way and so on. Yeah, that works, right? You can you can do that. Okay, good. So what you have there is a mental model of the world, right? You have a sense of how the world works. You can imagine it. you can imagine yourself acting in that space and that is actually a very very useful cont context right because um we’ve modeled the robot we can model the world in a in a very similar sort of way and the question is how how do we actually get to that place we suddenly re-mbbrace everything that we now believe is actually a bit problematic about generative models particularly we think about deep fakes and stuff and yes absolutely problematic but this notion of data synthesis is really quite powerful so we’ve all seen results like this where we go none of these faces or people actually exist which is true. Um but you know particularly in robotics some data is really hard to get. Uh so for example if we think about tactile data um when you have tactile interactions or when we think about radar data uh which is very very difficult to model um or when we think about lots and lots of diverse data of diverse environments like for example here everything that the only thing that stays constant. Oh this is not playing. Okay, the only thing that stays constant here is really the the can. Um, and there’s some some guys um from the Rosie project at at Google. It’s amazing. Um, where where everything else around that is sort of fake just to create diversity in the data set, right? So generative models in that sense are really really useful. But we can do more with that. We can actually action condition these models in such a way that we get what we call world models. And the community is very excited about this. Right? So it’s a bit like you closing your eyes and going I kind of know how the world works. I can imagine what happens when I act in a particular way but I learned that directly from data. And so what we see in here is so uh is the the GIA one model from wave um and the G2 model that came out sort of late last year uh from deep mind um which are basically things which are basically doing that. So all of these worlds are imagined all of these scenes are imagined um but they are learned in such a way that when you apply particular actions it sort of reacts in a particular way. And so these notions of world models are really really quite powerful because they kind of act as a knowledge store of the agent bit like you know you can do your your you can imagine what it’s like to to cycle on the other side of the road because you’ve done imagining the world for a very long time in lots of different contexts. They’re very useful for planning. Um and of course they can be used instead of the environment to try stuff much much faster much much safer very cheap for data augmentation. Um, and we can use them for what if questions because, you know, um, I might be able to imagine what happens if I step out in front of a bus, uh, but I don’t really want to try it. Okay, so that’s kind of cool. So, there’s a whole vista of of things that is now open to us, uh, including one other thing that has really happened in the past decade or so. There’s been, um, quite a bit of, uh, of evidence in the in the neurosciences that what we thought about the brain isn’t necessarily true. We used to think that the brain is sort of a feed forward uh entity, right? So information comes in and then it goes through this entire brain stack and somewhere at the top bing there’s a bit of understanding and we go, “Yay, this is great.” Actually, it turns out that that evidence seems to suggest that our brain constantly simulates forward what should be happening. And the reason why you can interact successfully with the world is your attention is drawn to bits that disagree with your prediction. Typical case, imagine you’re driving along on autopilot like we do, right? and then a cyclist swerves into the road and you go whoa and you suddenly become active, right? Because it just doesn’t fit with your mental model. So there’s some really really exciting work being done that sort of brings these worlds together. Okay, so plenty of stuff to talk about. Um models are super useful uh when it comes to planning and agents learning these models from data by themselves are are you know uh uh becoming increasingly important when it comes to to planning and acting. um generative AI is playing a a key role in this and you know maybe more so than people people do realize. There’s a really surprising set of side effects in terms of the structure that these models uncover in the data underlying it and what we can do with it. Um but you know as ever there’s a lot of work that’s still required to be done. You know we need to think about data efficiency, reusability, interpret interpretability which is which is super important. Um and one thing that my lab is super excited about is this notion of you know predicting the world is great but it doesn’t lead to insights right? Right? So, how can we how can we go from predicting the motions of the planets to, you know, the laws of physics, right? Completely open um and very ambitious, I hasten to add. If you like that, this is actually all part of a of a bigger grant that is sponsored by the UK government from sensitive collaboration um which is really all about the the notion of embodied intelligence. Uh lots of thanks to my group uh and collaborators and sponsors and so on. Um, and I suspect we don’t have that many time that much time for questions, but if there are any, happy to have a chat. Thank you so much. Thank you, Ingma. We have time for some questions. So, please raise your hand. I think hi thank you very much for the very interesting talk. Um I’m a surgeon and I do use the robot in surgery and you mentioned one of the input that is given is am I hitting something and am I reaching my target. Now the problem we find is that it doesn’t quite recognize when it’s hitting an organ and it might create important bleeds. Is there any way that can be more sensitive? So it will surely recognize if he’s hitting a bone,
it will not recognize a vessel. Mhm. So fundamentally that’s a hardware problem. So you need to be able to if you can’t sense it, you can’t react to it.
Uh and I don’t know what exactly is it the Da Vinci robot that you’re using? Yeah. So um I I don’t know what the what the actual specs are for that but there will be a limit as to how how to talk sensitive these these um joints are going to be uh and if you can overcome that either with sort of better joints or if you can put something at the end factor that is more tactile sens sensitive then that might be a way of doing it.
So is it the the researcher will will get to that or do you expect users to
so on the tactile p so this is going to be a company thing. Whoever sold you that robot will have to deal with that. Um but they will reach for some research uh um output like particularly on tactile sensing which is a super active field um or they will have to upgrade their their motors I suspect.
Thank you.
Right. Hello.
Hi.
How far along the line do you think autonomous vehicles will become? Will we get to accept them completely? You mean autonomous cars?
Cars.
Uh I honestly so okay disclaimer I’m a co-founder of a company that does this sort of stuff. Um but irrespective of that uh I actually yeah I do I think I think that will happen because the technology is coming along quite a way. I think you know with this sort of tech there’s lots of money involved and with that comes a massive hype cycle and it’s in the interest of people to sort of hype that cycle and so if we believe that at some point somebody’s going to flick a switch and everybody is autonomous and this is great not so much. Right. there’s a different way of in which we will get used to that technology and it may not even be on the roads. It may actually be in different contexts. So it may be on uh on sort of private compasses with sort of private vehicles or on golf courses or in in mind and so on. Uh and then it will filter out from there and of course you know if you go to to the states um autonomous taxis are are already almost everywhere, right? So um and they’re becoming very accepted. Professor, good afternoon. Thank you for your talk.
Uh recently I was in China doing some touring about robotic factories
and I was just very curious about how much connection and how much exchange of technology information do we have as a as a country
with China?
Okay, that’s a big question. Okay. Um so uh I’m not an expert in that by any stretch of the imagination. What I would say from the impression that I have is a lot less than it used to be. Um simply because the uh like various governments around the world including the UK government has reviewed um what kind of things get exported where uh and what sort of form that takes and this is known as export control and that includes a whole bunch of sensitive areas like robotics, AI and so on. It’s not all of robotics and AI but um it’s very very uh uh it’s increasingly tightly regulated as to as to how we can do this. Um so there is a lot of attention as to as to what kind of information gets exchanged and how that is done. Um hello.
Hey,
thank you very much Professor Ingma for the
Just call me Ingma. Ingma’s fine. Um, my question might be like a naive question, but why is teaching robots how to move, handle things important? Like what’s the point?
Okay, good, good question. What’s the point? Um, there’s a bunch of points. So, the first one is productivity in general, right? So, often we talk about what kind of jobs disappear because of AI and so on. Um it turns out that in many industries we don’t actually have the workforce that we would need to do all the things that need to get uh to get done. So typical um examples are things like agriculture are things like uh uh sort of small batch manufacturing uh in fact autonomous driving also fits into this right you know remember when we ran out of uh out of um lorry drivers right is this sort of thing. So uh there is actually a need to be able to do that. The other thing is you know growth is is another obvious one. So we might be doing X amount now we might want to do many more X amounts right so then we need sort of a bigger workforce that’s another thing um and then uh there are also other societal challenges like for example social care right so that’s a bunch of things come together there first one is there’s not enough of a workforce to be able to do this to kind of provide sort of really super high quality of care that we all would want and would also want for our relatives um and that also ties with uh not everybody who receives sort of social who who who would need help actually is a social care context, right? So, it may well be that your grandma lives alone at home and she doesn’t really care because she’s, you know, super super great with it and so on, but she could just do with a helping hand every now and then getting some stuff off the shelf or, you know, making sure that the cooker is switched off or that sort of stuff, right? So, there’s a whole breadth of applications that means that actually we we do have a need for this sort of technology. Um thank you so much for a very insightful talk again. Um I have a very basic question. How often are this models biased towards a positive outcome the models that you generate and especially when you mention that there’s a classifier
which between the two phases and that would try to predict that whatever you you’ve come up with is true. So how often does that bias exist within a model and how do you counteract it?
Okay. Right. Okay. So I just want to rephrase it to make sure that I get your question. You’re asking about uh about sort of bias in machine learning sort of thing, right? Okay. Good. Um because in these experiments of course like the nature of the system is we do like the classifier has to predict. Yes, it’s done it right. Fine. So um the bias thing is a is a really is a really important one. I have a bit of an issue with it. Um which is this. So there’s a whole field of people that are saying we can get so our models learn on data that is biased and therefore bad decisions get made and this is true without a doubt. Um but of course the decisions are made by people often based on the data that provides the models which is the first part of the challenge. The other thing and there are people that disagree with me on this. Okay. The other thing is it seems odd to me that I expect a machine learning model that I train with a particular data set to suddenly go wait there’s a whole part of the world that is not represented properly because the model doesn’t really know anything about the world right so fundamentally I I I see that as a as a societal problem and it’s interesting right because many people talk to me like they ask me about what is the role of government when it comes to you know machine learning and do we need another 50 trillion for compute and all that sort of stuff uh and the answer is yes Please uh but then the answer is actually um if as a government and this is not just our government any government I think a key role is to be forward looking as to what data sets you will need five years from now to solve real world like challenging problems to all of us and make sure that we come up with a data gathering regime start that now such that five years from now when we go hey we have technology that can do this they go excellent insert data here please right um I think that is a really critical role for for government. Oh, absolutely. Like Yeah. So, so yes, I mean that
often that exists
don’t often that that doesn’t exist often in science.
Yeah. So, we can there’s a big conversation to be had also about you know about sort of you know publishing on negative results and so on like all of that stuff. Yes. It is true but but machines learn from negative as much the same with us right if we’re always good at everything we don’t learn anything you need to fail in order to make progress right so what about synthetic
synthetic okay
in a lot of these examples I thought the scale AI part of why it was so successful
scale AI
scale yeah
because it generates yeah right
yeah okay synthetic data that
yeah we run out of uh okay so the question is what about like I’m going to paraphrase now the question is what is the role synthetic data in all of this.
The role of synthetic data is huge, absolutely huge because uh it allows us to generate masses of data that we can’t like it would take forever and it costs a lot to generate in the real world. Um that is irrespective of whether scale AI is the you know
yeah but but the so synthetic data is is unbelievably important. Um, interestingly, depending on who you who you talk to, there is there is a problem with synthetic data, which is um, sort of the the synthetic environment, let’s call it a simulator, right, is not the real world. Um, which is undoubtedly true. And so, whatever goes into the simulator is not quite going to be representative of what the real world does. And this is known as the sim toreal gap. So, if you only train stuff in a simulator um, out of the box, it’s not going to work in the real world. And then we have other ways of dealing with that. So it’s a process called domain randomization where we’re saying well what if now all of these things are sort of different in the simulator. The problem with that is it’s unbelievably inefficient and it takes a lot longer. So it’s really really quite expensive to do that. Um but it does allow us to do a lot of stuff. So a lot of the you know a bunch of the robots that you saw in the in the videos were actually purely trained in simulation right. Um so yeah it’s super important. It’s super important but there’s a lot of stuff that we don’t quite understand. And there’s the inverse problem. We can talk about it afterwards. there’s the inverse problem of saying if I actually h see the scene now how do I get that into my mental model that’s sort of the inverse thing of of you know how to go from reality to simulation real to sim unfortunately we don’t have any more time for further question if you are still curious about uh inmar’s work and further questions about the topic I’m sure is happy to answer your questions but have to happen outside this room uh so thank you so much Igmar
pleasure thanks [Applause]