Murray Shanahan, professor of cognitive robotics at Imperial College London and a senior research scientist at DeepMind, joins Azeem Azhar to discuss AI: where developments have exceeded expectations, where they have fallen short, and what the next steps are towards an artificial general intelligence (AGI).
They explore the hurdles that stand in the way of truly intelligent computer programs: why an AGI might need to exist in a body that can sense the world around it to reach its full potential, whether we can teach computers common sense, and what it means for a self-driving car to “think.”
They also discuss:
- What the future of AI may have in common with the aeronautical ingenuity of the Wright brothers.
- The pros and cons of the various approaches to AI from symbolic models to convoluted neural networks, transformers, and generative models.
- Why huge salaries for commercial deep learning engineers might actually hinder research.
@mpshanahan
@exponentialview
@Azeem
Further resources:
AI’s Competitive Advantage (Exponential View podcast, 2021)
How To Practice Responsible AI (Exponential View podcast, 2021)
‘Conscious exotica’ (Murray Shanahan, Aeon, 2016)
State of AI Report 2021 (Nathan Benaich and Ian Hogarth)
AZEEM AZHAR: Welcome to The Exponential View podcast where multidisciplinary conversations about the near future happen every week. Now, as an entrepreneur, investor, and analyst I’ve been inside the technology industry for over 20 years. During that time, I’ve observed that exponentially developing technologies are changing the face of our economies, business models, and culture in unexpected ways. Now, I return to this question every week in my newsletter Exponential View, in this podcast, as well as in my recent book The Exponential Age. So, in today’s edition I wanted to look back and forward on one of the key technologies of the exponential age, artificial intelligence. We’re about a decade into the current industrial boom in AI and I thought it was time to take a scorecard, look at what we’ve achieved, and how and perhaps what we didn’t on which milestones have surprised us. To help me I called on a great experts Murray Shanahan, a senior research scientist at London’s DeepMind, as well as a professor of cognitive robotics at Imperial College in London. Murray works on machine learning, consciousness, the impacts of artificial intelligence. He and I have known each other for a few years and have indeed done a podcast together previously. We appeared as guests on a show hosted by a technology investor. So, my challenge to Murray today was not simply to access the last 10 years of development, but to look forward to the next 10. It’s a bold challenge and we did our best to look forward as well as back. Murray Shanahan, welcome to Exponential View.
MURRAY SHANAHAN: It’s very nice to be here. Thank you and thank you for the very flattering words.
AZEEM AZHAR: It’s 2021 when we’re recording this and it’ roughly a decade since there were some interesting breakthroughs in what was probably called computer science back then when a certain new style of neural network was strung together with some newish chips and we had some breakthroughs back in 2010, 2011, 2012. And so, we’re about 10 years into this wave of artificial intelligence. Roughly, how’s it been for you?
MURRAY SHANAHAN: It’s been for me an astonishing ride because when those big breakthroughs started to happen such as the tremendous success in the ImageNet competition in 2012 then I started paying attention to the whole neural networks paradigm in a way that I hadn’t really done before. My background was very much in symbolic AI and although I had really abandoned AI altogether for a bit suddenly there were these great results coming out of the neural network sort of field, and I started to think actually maybe it was a good time to get back into mainstream AI again with this new paradigm. And of course, that was just the start.
AZEEM AZHAR: What I hoped to do in this conversation is for us to take stock of what we have achieved and what we haven’t, and where there have been unexpected wins, and where there have been perhaps surprisingly difficult horizons that we haven’t yet reached. I think it’s quite interesting for us maybe to start with some terminology. You said earlier on that neural networks hadn’t been a focus of your work, but symbolic systems had been. What do you mean by that and what’s the difference between them?
MURRAY SHANAHAN: The so-called classical AI paradigm that’s sometimes called good old-fashioned AI, or symbolic AI, was really dominant in the 1980s for example when I was doing my PhD and it was really all about taking or using representations that are quite language like in their structure and applying reasoning processed to them. So, for example your classical expert system might represent a whole load of knowledge about a disease say is a collection of rules, language like rules, and then to do certain conclusions such as a likely diagnosis by carrying out logic life inferences on those rules. And there one of the features to that whole approach to AI was that the rules usually had to be hand coated, so some human had to design the rules and write them by hand and then the machine will process them.
AZEEM AZHAR: And that’s pretty hard to do because machines are pretty dumb, and if you even try to write the explicit rules for making a cup of coffee there are so many ifs and buts and whatevers, right? The rules differ if you’re lying in bed than if you are standing by the coffee machine. So many other things needs to happen, and what if the coffee machine doesn’t have water in it then there’s another set of rules that need to be built. So, it’s a very laborious, complex, and probably both fragile and error prone approach.
MURRAY SHANAHAN: Absolutely, and something you were touching on with your example there which is a fundamental challenge for artificial intelligence is the problem of common sense. So, that example of making a cup of coffee the way that we humans do in an ordinary kitchen we rely on our common sense understanding of the ordinary everyday world. We all know what objects are for a start, and we know what containers are, we know what liquids are, we know what gravity is, let alone knowing what coffee is, and milk is, and so on. And now, there were certain attempts from this symbolic AI paradigm to try and encode common sense in logic, in lots of logical sentences. I’m reluctant to dismiss anything as hopeless but that kind of whole approach to trying to co-defy common sense I really don’t think is ultimately going to work very well. But the thing is that common sense still remains I think one of the biggest challenges to artificial intelligence today, so even with this new paradigm of deep learning and neural networks it still remains one of the really big challenges.
AZEEM AZHAR: And the distinction then with neural networks and deep learning is that rather than co-defying every single rule for every possible contingency we know that the world works in a particular way so we gather evidence and we present that evidence to some kind of algorithm that can learn from that evidence to give us what we want.
MURRAY SHANAHAN: Yes, the emphasis is very, very much on learning things from scratch these days rather than having humans trying to encode things in rules, and to minimize the extent to which humans have to impose their own thinking on the problem. Now, of course there’re different domains of application for these ideas. I think it’s really important to separate out commercial applications from the grand vision of artificial intelligence that the founders of the field like John McCarthy and Marvin Minsky had in mind, which is sometimes these days called AGI, artificial general intelligence, which is to make artificial intelligence that is in some sense as smart as humans and capable of the same level of generalization as human beings are capable of and possesses common sense that humans have. So, that’s one grand vision, but the real successes have been on a much more commercial level. Problems like for example face recognition or classifying objects and images these are all things which you could do in very modest and not very useful ways in the lab, very, very slowly back in the ’80s and ’90s, but then thanks to GPUs and thanks to having large amounts of data there is a sudden leap in performance and what you could do on that kind of application from the early 2010s onwards, and that’s what’s really fueled the tremendous progress and all the investment because it allows all sorts of applications to be developed that weren’t really practical before.
AZEEM AZHAR: And it is pretty remarkable. So, since that period of 2010 tens of billions of dollars of venture capital has gone into AI based startups. Corporate investment in these applications of not just neural networks, but also much simpler machine learning techniques have reached the tens of billions of dollars globally as well, and there’s a sense that we’re really only in the foothills of this from an industrial technology. The thing that I found curious when I’ve been looking at this of course is that the underlying theory is reasonably old, so what does a man in the street like me learn from that process, and learn from this idea that there were breakthroughs?
MURRAY SHANAHAN: One of the big takeaway messages is encapsulated in this little phrase, the unreasonable effectiveness of data which I think was coined by some Google engineers quite a long time ago.
AZEEM AZHAR: Peter Norvig I think was one of them.
MURRAY SHANAHAN: Yes, you’re right Peter Norvig was indeed one of them, and it captures the idea that you might be a researcher stuck in the lab and you’re trying to get your little image recognition system to work on say handwritten digits, digits not aligned, like the well-known endless dataset for example. And so, you might get that working reasonably well but it’s grindingly slow but eventually it works quite well. And then, you try and skate up a little bit to something like ImageNet or something similar, so these are these much larger high resolution color images of a wide variety of everyday scenes. And it’s disastrous. It really doesn’t work very well. So, you think, okay, let’s scale up, so we’re going to scale up, so you’ve doubled the amount of data you’ve got and you double the amount of computer. It’s still crap. Then time goes by and even more data becomes available, even more compute becomes available, and let’s say you go to 100 times what you were working with 10 years ago. And suddenly, everything seems to work, so might have concluded beforehand that I’m really throwing a lot of compute and a lot of data at this problem, but it just turned out you just weren’t throwing enough at it, and you get to a certain point when suddenly things take off and work amazingly well. And that’s sort of what happened really I think in 2012 is that we reached a tipping point where enough data and enough compute was available that suddenly these techniques which didn’t seem to scale all that well and suddenly they seem to scale amazingly well. So, this unreasonable effectiveness of data is one of the big lessons of it.
AZEEM AZHAR: You used the word suddenly a few times there and it does really feel sudden in the context of the rapidity of the switch, which was 2008, 2009 people starting to notice this might be possible because there was more data on the internet to 2012 it had become the absolutely dominant paradigm at least within the big internet companies, so there’s theory that is well understood. There was the arrival of the data from the internet. The thing that also was important is the compute, and in 2007 NVIDIA which back then was just a company making graphics processing units for video gamers not the trillion dollar behemoth that it is today released something called CUDA which was a way of allowing developers to write to these graphic processing units, these chips, and within a few months of that coming out first machine learning researchers are starting to think can we move our operations from traditional processors to GPUs and CUDA arrives in 2007 we start to see a year or two later the first neural networks on GPUs and then within a couple of years they have become the benchmark standard. I’m trying to think about how we should look at that from the perspective of breakthroughs. Is that a scientific breakthrough? Is that an industrial breakthrough? Is it a paradigm shift, or is it just something else that deserves a different moniker?
MURRAY SHANAHAN: I think it’s almost something else. It came as a great surprise to me. I do remember actually sitting in Imperial and listening to somebody’s undergraduate project presentation which must’ve been some supervisor’s brilliant idea where they hatched a GPU to do all these numerical computations very, very rapidly that were nothing to do with graphics and I thought what the hell is this all about, this is really weird, and then I thought, hang on, wait a minute, this is not really weird at all. This is brilliant. Of course, these graphic cards are doing something that’s potentially far more general than just processing graphics. The really interesting thing is that why it possibly deserves some kind of slightly different moniker is because it’s a hack. It’s a sort of repurposing of a technology that was meant for something completely different, and then of course NVIDIA cottoned on to the fact that there was all commercial potential here, especially as the years went by and deep learning become more important than they previously recognized that we can start to take this seriously and actually then tailor out our boards to do these things. I think you’ve got two things there. You’ve got the unreasonable effectiveness of data that we mentioned earlier on, and the kind of repurposing the tech which are two interesting things that I would not have seen coming.
AZEEM AZHAR: One of the things that struck me though, once we went into the industrialization phase it catalyzes the whole ecosystem. There is a bad propagation that occurs across industry because we now have lots of families of layered neural networks. We have convolutional neural networks which were very good for images and sounds, then you had recurrent neural networks, which had started to emerge in the late ’90s. You have these new transformer models which are very good with text, and you have generative models, the deep learning. But they’re all quite different, and so is there a mechanism by which the industrial success starts to push back into the research frontier and that allows research to go into those directions, or were those directions already being explored, and are now brought to light because we are interested in this field again?
MURRAY SHANAHAN: Among the examples that you listed there I think they have different characteristics because if we take convolutional neural networks to begin with, CNNs, they were absolutely around in the late ’80s and early ’90s. They predated all of this success and indeed they were pivotal to the success that happened in say the ImageNet competition. So, they’ve been around for a while, but if we contrast that with transformers, transformers really only came around in 2017, and I think they are an example of where maybe without that explosion of activity thanks to industrial interest and huge amounts of investments and computer science departments suddenly becoming interested in these sorts of topics whereas for years the AI people were obscure people stuck in a corner trying to argue that they shouldn’t be fired, suddenly lots and lots of people are working on this stuff. So, I think it’s inevitable that under those circumstances you’re going to get really interesting innovations, and transformers are a great example of that. Now, transformers are really weird. When I first tried to get my head around the transformers paper when it was first out I just could not understand what they were doing and why they worked or what was going on. I thought, what is going on here? But I’ll try and say something about the essence of transformers. One way of looking at them is this, so if we think about a layer of a neural network and the individual neurons in that layer in relation to convolutional neural networks it’s quite a nice idea if say neuron 10 is only looking at a patch, a five by five patch from the layer below. That’s a really useful idea, but if you think about it it’s a restrictive idea because maybe that neuron you’d really like it to get the context of the whole image and then it would be even better at whatever it’s trying to learn to do. So, you are sacrificing something by cutting down its inputs to just that five by five patch. So, one extreme you’ve got your neurons taking these five by five patches. At the other extreme, the neuron is looking at the entire image and then you’ve got all these zillions of parameters, but there you’ve got all the context, the whole image, as opposed to just looking at a little patch. Now, the amazing things about transformers is that they have all the advantages of the fewer parameters that you get with the little patches without the disadvantages of being restricted to just a patch of the image. So, in effect, it’s a way of learning, so a neuron can learn which part of the input to it to attend to, which part to look at, which parts are important for whatever I’m trying to learn to do if I’m the neuron.
AZEEM AZHAR: Yeah.
MURRAY SHANAHAN: So, instead of being restricted to just this patch of what’s in the layer below me I can learn which parts to attend to. It’s often relevant in text. So, if we have a piece of text like “The cat sat on the mat,” or something, and maybe you’re trying to predict the next words, and suppose you’ve got a sentence like “The cat came into the room, looked around, admired the fireplace, purred happily, and then sat on the …” Right? When you’re trying to predict the next word there which parts of that sentence do you really want to attend to? You want to look all the way back to cat at the beginning of the sentence because you know that cats like to sit on mats. That statistically occurs a lot. And the thing is it can learn which parts of the sentence to attend to in order to do the thing that it wants to do, and so that’s the essence of it. It’s really hard to explain verbally.
AZEEM AZHAR: You did a fantastic job. I think the thing that’s interesting is that transformers appear to work really well wherever we see sequences of data and although we started using them on text we found them to be a bit general and they seem to work on images, and videos, and sequences in proteins, then lots of other places.
MURRAY SHANAHAN: Yeah. That is really remarkable. I think it’s a very general technique for basically learning what bits of information are important, whatever the structure of that information that you’re looking at.
AZEEM AZHAR: And one of the things that our brains do is it figures out which of those are important and which to pay attention to, so we’re not overwhelmed by stuff coming in through our eyes or ears. Is that just an analogy or are we learning something?
MURRAY SHANAHAN: It’s much more than an analogy I think because of course if you look at the details of how these things are implemented the central operation that you’ve got there is attention, so you have a little if you like a little sub-network that is learning what to attend to, so that’s really the way they work. You have lots and lots of these little tiny sub-networks that are learning what to attend to.
AZEEM AZHAR: It seems like there has been some interesting research rather industrialization progress in this area of the transformer that was perhaps not predictable if you had gone back to 2008 as a model that might work, right? Deeper and deeper convolutional neural nets one could assume because we’ve already seen small ones. But there’s been some really interesting research that’s perhaps been a bit surprising given where we might’ve been 10 years ago.
MURRAY SHANAHAN: And of course, the specific examples where we see this most vividly are these so-called large language models that really have got a lot of attention.
AZEEM AZHAR: Listeners will remember I spoke to Sam Altman who was behind the OpenAI group that came out with the GPT3, where the T in GPT is transformer, and it was one of the earlier large language models, and what I think is fascinating is that they first came out in 2017 which is in research terms not far off from the first recurrent neural networks which was only 20 years earlier, but in industrialization terms they have set the world alight.
MURRAY SHANAHAN: Yes. They certainly have, yeah. GPT3, it’s funny you said one of the earlier, but it’s actually there was a GPT2 before that that of course got lots of attention in its own right, and I think GPT3 was 2019, even 2020. At the point where GPT3 came out, because we’d already seen some pretty successful large language models and they’re really pretty good at things like machine translation, those powering Google Translate for example. But GPT3 it’s all about more data, and more parameters, and more compute, so it’s all scaling up. So, you scale up the number of parameters from GPT2 to GPT3. And when they released it there was all sorts of very impressive results that could generate new stories that were much more convincing than before, and poetry, and jokes, and all kinds of interesting things. So, all of that was very impressive but it didn’t actually surprise me. The thing that did surprise me though was that GPT3 was able to perform what somebody thinks of as out of distribution generalization. So, you could actually by giving GPT3 an input prompt which actually described via a number of examples a completely new type of little puzzle to solve say such as completing a little word puzzle for example, and miraculously it often did quite well at those things. Now, that was an enormous surprise to me because apparently it was able to solve things that were not in the training data at all. And that was really interesting.
AZEEM AZHAR: That’s right.
MURRAY SHANAHAN: And that was the one that made me really sit back and I thought, yes, okay, this is really interesting, and what it suggests is that somehow these large language models like GPT3 and Google’s BERT somehow they managed to find within the parameter space that they’ve got a very powerful generic mechanism, because that’s the best way to solve this problem of next word prediction is to have a very powerful general purpose mechanism. And they find it, and then it could be repurposed for a completely different thing. So now, there’s a whole new art form, which is prompt engineering. You can try and work out how to engineer-
AZEEM AZHAR: To try to get something useful ideally out of the prompt.
MURRAY SHANAHAN: Yeah. So, how to coax these large language models into doing things that they weren’t trained to do at all.
AZEEM AZHAR: Okay. So, transformers are one thing that perhaps surprised us. I would love to think about some other things we would’ve been surprised at. I think that when you moved from neuroscience over into doing more AI research some of your earliest papers were in the field of reinforcement learning after that 2010 period. How should we evaluate what we’ve been able to achieve with reinforcement learning?
MURRAY SHANAHAN: So, reinforcement learning is really trial and error learning, so we can think of it by analogy with how animals learn, or how we might learn to play a game like tennis or something, so basically we try out various actions, and some of them are successful in the sense that they give us some kind of reward.
AZEEM AZHAR: But reinforcement learning is now a thing. We watched DeepMinds Alpha Go, and Alpha Go Zero demonstrate incredible game playing ability in games like Go and chess and reinforcement learning is appearing in all sorts of industrial applications, so do you think we’ve been surprised by the progress in reinforcement learning?
MURRAY SHANAHAN: Yes. I think DeepMind is very much responsible for this, and it is a bit of a renaissance if you like because the basic ideas of reinforcement learning have been around for quite a while, but the secret sauce was to add deep learning. So, we’ve already been talking a lot about deep learning and neural networks, so adding a bit of deep learning and neural networks to the basic reinforcement learning idea is what enabled reinforcement learning to suddenly do things that we really weren’t able to do before. So, a lot of this has been shown with games. The first really big success that DeepMind had was with this system called DQN, which was able to learn to play these Atari games like Breakout, and Space Invaders, and so on, and get really, really good at them. And the key thing was that it did it from raw pixel, and it just has the raw action space of the game’s controller as the output. It knows nothing about Breakout or something where you’ve got a ball that you’re trying to hit with a paddle to knock bricks out of wall. It knows nothing about bricks, and paddles, and balls, and balls, or anything like that. All it sees it just the pixels and it has to learn from scratch how to get good at getting reward, which in this case is the points in the game.
AZEEM AZHAR: Yeah. But did it know about points, or did it just look at the pixels of the score?
MURRAY SHANAHAN: It did know about points, yes. So, that is the reward signals. That’s the one thing. You have to know what reward is, what you’re aiming for really in reinforcement banks. But raw pixels was what it would learn from, and this was tremendously … For me, that was the thing that really changed my research direction. So, having spent about 10 years or more in neuroscience, having abandoned classical AI really because I didn’t think it was going anywhere, the thing that brought me back into artificial intelligence research again was DQN because I thought wow this is something that’s really getting a bit close to this sort of general intelligence that humans have is learning this thing from scratch. And so, I was really impressed with DQN. It really spurred my research on.
AZEEM AZHAR: When you look at the progress in reinforcement learning in the last 10 years is it more like CNNs, we understood the theory, we just didn’t have the engines to put it into practice, or is it more like something where these new engines have allowed us to explore this fertile terrain and come up with the theories?
MURRAY SHANAHAN: I think it might be a little bit more like the CNNs. Now, of course they had to come up with lots of brilliant innovations in order to get Alpha Go to work, but in a sense the basic ideas were around beforehand, the kind of clever combination took place with DQNs, so putting deep networks together with reinforcement learning. That was a big breakthrough I think. Then this is a very controversial thing, again I want to be careful what I say, but I think in some ways Alpha Go followed from that the idea of joining those deep learning and reinforcement learning with Monte Carlo tree search. That requires a lot of engineering, lots of cleverness, and indeed lots of compute, because it did a great deal of self-play in order to get to the standard that it got to.
AZEEM AZHAR: We’ve got a couple of example where there has been perhaps surprising directions that we’ve ended up going in. Where there areas that we didn’t make as much progress as we might’ve thought we might’ve done going back 10 years ago.
MURRAY SHANAHAN: Absolutely. I think our progress towards human level artificial intelligence has actually been quite slow in a way. It’s not obvious to me that there aren’t still some missing breakthroughs and I do think that for example your common sense is something that we really haven’t cracked yet, and what I sometimes called foundational common sense, just being able to understand the everyday world of 3D objects, and space, and gravity, and surfaces, and paths, and portals, and containers. This is a fundamental list of very basic things that organize our world, very basic physical things that we have an intuitive understanding of. And this is not present really in any of our systems today, and I think it’s a big obstacle to progress.
AZEEM AZHAR: I think we should sketch out the terrain of the things that are still needed perhaps outside of the narrow ideas of industrial AI but to this mission of artificial general intelligence. You talked about common sense as being something that was missing. I often have a question around the importance of embodiment as driving both the learning and the representation of intelligence in the real physical world. Virtually everything that we know of that lives on any kind of axis of intelligence is embodied whether it’s an octopus or a fox, or indeed a human, and so I wonder about the importance of embodiment in this field of learning.
MURRAY SHANAHAN: So, I do think that embodiment is essential for breaking through the barriers that still face us in getting AI that’s closer to human level and closer to human like capabilities, because I think it’s only through embodied interaction with the physical world that we can learn that basic layer of common sense. When it comes to language, so all of these language models are of course completely disembodied. I don’t believe that that’s enough for any AI system to actually acquire this base level of common sense that we really would need to move beyond the limitations of today’s AI. So, I think embodiment is critical. However, I do think virtual embodiment is one way of achieving this.
AZEEM AZHAR: Virtual embodiment would be to say it’s really expensive to teach a delivery robot how to interact on a road outside a kindergarten in the physical world, so we’ll create a virtual simulation and allow that virtual robot to navigate that space and if it knocks over a toddler it’s only a virtual toddler who can get rebooted for the next simulation, so we think of the virtual embodiment is nearly as good as real embodiment.
MURRAY SHANAHAN: Yes, absolutely. The virtual systems that I like to play with they involve little embodied agents in environments that are facing challenges similar to the ones that animal cognition researchers will confront animals with when they want to measure their cognitive ability, so finding a reward that is inside of box or behind a wall or something that you’d have to then maybe get a tool to open the box or that kind of thing. So, those are the sorts of settings in which I think it might be possible for a virtually embodied simulated, an agent in a simulated 3D game style environment, to learn this foundational layer of common sense. One thing that we’ve neglected to talk about in this context so far is other agents, and this is really essential. Language doesn’t really make any sense unless you have other language users. It’s an embodied and social phenomenon, so we acquire language by interrupting with other language users who inhabit the same shared world that we do, so we have things that we can talk about that are out there in the world and we share that world, and that’s really essential I think.
AZEEM AZHAR: So, is common sense really about how the objective gets framed which is when I asked you to get me a cup of coffee I didn’t mean steal someone else’s cup of coffee, I meant all these other norms, or is common sense something that overarches the strategic behavior of the agent in its environment?
MURRAY SHANAHAN: We use the phrase common sense in everyday language to talk about a number of different things. So, we would certainly say that it’s common sense if I ask you to go and get me a coffee that you don’t steal it off somebody else and knock them over. But that’s not really what I mean by the term. I mean this sort of understanding the ordinary everyday world of physical objects, which as I see it is a foundation for everything else. We just cannot get off the ground with understanding social situations with language until we have agents that understand what objects are, that understand that objects are things that occupy a space and are still there when you go away from them, and that have backs and undersides, and you can pick them and put them down, or insides that you can break them open. There are so many things like that that are fundamental to our understanding anything.
AZEEM AZHAR: So, one of the questions when we think about where progress emerges is the extent to which we need working theories about these things against which we then have to engineer the thing that demonstrates that theory or whether we can engineer our way through to this. And I think a couple of different examples contemporaneous actually is that they happen to be … One is heavier than air powered flight with the Wright Brothers where we didn’t necessarily have a theory about how that would work. They built something but did it, and about the same sort of time Einstein came out with his theory of special relativity, which was a great theory, but we didn’t prove, prove empirically pretty much essentially until LIGO I guess. So, when we look at this sort of unanswered promise of AI is it going to be more like special theory of relativity, or is it going to be more like Orville and Wilbur on the fields at Kitty Hawk?
MURRAY SHANAHAN: I think it’s going to be more like Orville and Wilbur. It’s an engineering challenge in a way to overcome these problems. It may be engineering breakthroughs that we’re looking for. But on the subject of confirming theories how do we know that we’ve got a good theory of common sense, or that we’ve got a good implementation of common sense? I’m not very convinced about these language models, and I’m not very convinced about the way that we test language models for common sense. So, I think that’s the wrong starting point. I think we need to start off with something embodied and we need to test whether it understands the everyday world, or really basic things like objects, and then we can build language on top of that. I think we can lift many ideas from the field of animal cognition, or developmental psychology. People who work in developmental psychology and animal cognition have over many, many, many decades devised these ways of testing whether a subject possesses certain cognitive abilities. And very often the way that you do it is you have to do a transfer test. You present them with a setup, which is a bit different from anything we’ve ever seen, and if they’ve really understood the deep things on a deep enough level then they will apply their deep understanding and say connections, or shapes, or obstacles, or gravity and weight and things they’ll apply that deep understanding and be able to overcome even a novel situation. So, there’s a foundational layer of understanding the everyday world they can apply to a novel situation, and that’s how you test for the presence of that understanding. And so, I think these paradigms are really important and I’m very much in favor of importing them into AI.
AZEEM AZHAR: I would say that in the 10 years that we’ve seen this breakthrough the subject matter expertise has not been that important to the researchers making the breakthrough and the industrialization of these techniques. It is all about can you figure out how to get the data in, make the model complex enough, with enough widgets and dials to turn, or the parameters that you tune it with, but there hasn’t been that much of an emphasis on saying what would somebody who understands how an Alsatian puppy learns say becomes important, the pattern of that learning. And so, that feels like it’s not where perhaps the research has been in much of the past. There may be people doing it, but it doesn’t seem to be common in the deep learning field more broadly.
MURRAY SHANAHAN: I think you’re mostly right, and it’s very interesting because I think it’s true that you can do absolutely extraordinary things. It’s a whole new paradigm in a way of doing all kinds of things as you just throw machine learning at it and amazing things can happen. So Alpha Fold is an example of that. Alpha Fold is DeepMind’s system that has solved in a certain sense the protein folding problem, so it can predict the three dimensional shape of proteins once they’re folded.
AZEEM AZHAR: And the importance of that problem I guess is that life is proteins and the interactions between proteins and there are many, many, many millions upon millions of proteins of arbitrary length made of one of 20 amino acids and how a protein works and it expresses itself in its local environment is often dependent on the way in which it folds, and even a short protein might have many, many more combinations of folding than there are atoms in the universe, like 10 to the 88 atoms in the universe, 10 to the 300 possible ways you can fold a given protein, so it’s a really hard problem to predict. It’s not even astronomical. It’s much worse than that.
MURRAY SHANAHAN: It’s super astronomical, super-duper astronomical. But the interesting thing is that the Alpha Fold team, and of course they had to bring in all kinds of people who are experts on proteins and protein folding, but it really is the brute force engineering, and compute, and data again that has solved the problem, and considerably with machine translation. It turned out that you don’t need … You probably do need a certain amount of expertise in linguistics, but that’s not what really cracked the problem. What really cracked the problem was the engineering, the data, the compute, and these techniques, but if we think about the whole business of general intelligence and the challenge of achieving the vision of AI I think the tide is turning a little bit on that, because many people recognize that deep learning is hitting a number of really fundamental problems that we really don’t know quite how to solve. And maybe we do need to revisit cognitive science and neuroscience as a source of ideas. I think everybody recognizes that to go that bit further towards a more powerful AI that doesn’t have the shortcomings of today’s AI is going to involved revisiting certain ideas to do with reasoning, symbolic light representations, thinking hard about generalization, so there’s a number of topics.
AZEEM AZHAR: Look, it’s 2021, and I’m sorry to ask you this question, but I’m going to. Let’s go forward to 2031. In order to surprise ourselves then where are there areas of research that we need to just put much much more attention and focus on?
MURRAY SHANAHAN: I think that is an impossibly difficult question. All you can do is nurture the conditions for surprises to arise, so if perhaps there are a couple of conceptual breakthroughs that are really going to surprise us over the next years as I’m sure they’ll be several, what we really need to do is allow researchers the freedom to try out different things, both in academia and in industry. In research labs like DeepMind there is quite a lot of freedom to pursue the ideas you want to pursue. And there is a danger there I feel because deep learning has been so successful that it’s very tempting for young researchers to just plow that very successful furrow. They can be pretty confident that they’re going to get a good job, but going out and exploring something very different is very risky.
AZEEM AZHAR: This seems to me to be a meta problem of artificial intelligence because there is a framing from intelligent agents in your field which is the balance between exploitation and exploration. When you arrive at the land of milk and honey you should stop exploring. You just exploit it because it’s got its definite rewards and there’s uncertainty and risk and dragons if you explore. And one of the things that I would observe is that deep learning has been so successful and it is a path to seven figure salaries and impact because it goes on every smartphone in the planet, a field that has really theoretically been about how do agents balance between exploitation and exploration has found itself stuck exploiting.
MURRAY SHANAHAN: It’s not completely stuck but there is pressure towards the kind of conservatism there I think. But then the way you get out of that is by ensuring that your epsilon parameter in this isn’t stuck at the extreme and you are always doing a bit of exploration, and maybe that’s what we need to make sure that we’re still doing plenty of exploration within the field, and I think that’s really important.
AZEEM AZHAR: So, we want to see a bit of exploration, a bit more exploration, research is going in other directions that might lean off human cognition, or studies in animal learning, or thinking about embodiment. Maybe there are other type of agent based approaches that could be interesting. If we start to make progress in the next 10 years would we feel that we’re making some progress towards this idea of artificial general intelligence, or do you think we would have started to reframe the question and the yardsticks that we use?
MURRAY SHANAHAN: This concept of artificial general intelligence I’ve noticed that it does get interpreted in different ways by different people and I think misinterpreted in some ways. To my mind, artificial general intelligence the challenge there is to make AI that is less brittle and less prone to stupid mistakes than today’s artificial intelligence, and it is very much to do with common sense. We can see all kinds of examples with self-driving cars. Every so often somebody will post on the internet some completely weird situation which totally throws the self-driving car that where any human immediately sees what the problem is. The great one of the self-driving car is following behind a truck carrying traffic lights, and so of course the car things there are millions of traffic lights flying past and doing weird things, so it completely … But of course, the human understands they’re traffic lights but they’re on the back of a truck in front of me. And so, related one is a car sitting at a junction and a big lorry crosses the path in front of the car and the green light that is allowing the cross traffic to move is suddenly reflected because it’s wet and it’s been raining and this lorry has a very shiny side and this green light is reflected in the side of the lorry as it comes past, and so the self-driving car things the light’s turned green, I’ve got a green traffic light in front of me, so it starts to move forward and the driver has to quickly put the brakes on. Again, the human immediately understands that it’s not a traffic light. It’s a lorry with a reflection on it.
AZEEM AZHAR: You used a word in there. You said the self-driving car thinks.
MURRAY SHANAHAN: We have to anthropomorphize in order to explain … It’s what Dan Dennett calls the intentional starts. We can’t help but adopt the intentional starts, but it’s a communicative shorthand, but of course those things have to come in scare quotes, the word thinks of course is in scare quotes there.
AZEEM AZHAR: One of the things of your piece of work that I first came across was your writing around the space of possible minds, the idea that actually there’s a difference in the capacity for conscious action and behavior, or consciousness and how similar these things are to humans, and sometimes in some of the debates the question about artificial intelligent agents suggests that we’re on a number line with the ant far off to the left, and the chimpanzee somewhere in the middle, and the human a bit further out to the right, and this idea that machines will progress on that line beyond us. And of course, I think we’ve started to understand that there are many different types of capacities that are not necessarily on the same dimension, that there are many dimensions of them, how do you compare octopus intelligence to human intelligence as an example. So, when we think about that now how do you think about the human likeness of artificial intelligence? Do you think that this idea of the space apostle mind that it might be quite unlike, yet still powerful? Does that still hold?
MURRAY SHANAHAN: I’m fascinated by the topic of consciousness and have written a lot about it and thought a lot about it, but when it comes to talking about consciousness in the context of AI it’s very fraught idea. I don’t think that anything that we’ve built so far in AI, or that’s really on the horizon, deserves to be described in terms of consciousness really. With that said, the term consciousness itself is actually a multifaceted term. Let’s go back to the self-driving car with the things in scare quotes. There is a sense in which the self-driving car has a kind of awareness of its surroundings, and I think we can start to remove the scare quotes from the word awareness as long as we use it appropriately in that kind of context. There is a sort of awareness and awareness is a fundamental aspect of consciousness. But we need to be really careful because when we think about humans and other animals then consciousness comes as a bundle of things that include in particular the capacity for suffering and that’s not remotely relevant for any of the things that we can build today. I think it’s really important to make that point.
AZEEM AZHAR: Have we made more progress or less progress in this field over the last 10 years than you might have expected?
MURRAY SHANAHAN: In the last 10 years more progress, absolutely more progress. We’ve still got a long way to go before we achieve the vision of the founders of the field, but we have had a lot of very exciting progress.
AZEEM AZHAR: So, would you say the next 10 years looking out from 2021 will progress be faster than the last 10, or will it be slower?
MURRAY SHANAHAN: I just don’t know. I often say to achieve the vision of AGI we need an unknown number of conceptual breakthroughs, or maybe the number is zero, maybe just more data and more compute will get us there, and some people certainly think that. Maybe it’s a simple breakthrough that will happen, or maybe it’s really hard, so I just don’t know. Sorry.
AZEEM AZHAR: Let’s check in, Murray, in 2031 and we’ll see how you did.
MURRAY SHANAHAN: Wonderful. Thank you very much.
AZEEM AZHAR: It’s my pleasure. Thanks for coming on. If you enjoyed my conversation with Murray I’ve got good news for you, our archives are absolutely packed with conversations on artificial intelligence. I’m not even sure where to begin to be honest. We’ve got Danny Lange, who heads AI at Unity, Andrew Ng of Landing AI, Faith Haley of Stanford, Stuart Russell, Demis Hassabis, Kate Crawford, Gary Marcus, Jürgen Schmidhuber, so many more names. I have been privileged to speak to so many of the who’s who of the current AI wave, so please head to those archives and fill your boots. To become a premium subscriber of my newsletter go to www.exponentialview.co/listener where you’ll get a 20 percent discount. To stay in touch you can follow me on Twitter. In the US, I’m @Azeem, A-Z-E-E-M, an elsewhere I’m @Azeem, A-Z-E-E-M. This podcast was produced by Mischa Frankl-Duval, Fred Casella, and Marija Gavrilov. Bojan Sabioncello is our sound editor.