272 | Leslie Valiant on Learning and Educability in Computers and People

Science is enabled by the fact that the natural world exhibits predictability and regularity, at least to some extent. Scientists collect data about what happens in the world, then try to suggest "laws" that capture many phenomena in simple rules. A small irony is that, while we are looking for nice compact rules, there aren't really nice compact rules about how to go about doing that. Today's guest, Leslie Valiant, has been a pioneer in understanding how computers can and do learn things about the world. And in his new book, The Importance of Being Educable, he pinpoints this ability to learn new things as the crucial feature that distinguishes us as human beings. We talk about where that capability came from and what its role is as artificial intelligence becomes ever more prevalent.

leslie-valiant

Support Mindscape on Patreon.

Leslie Valiant received his Ph.D. in computer science from Warwick University. He is currently the T. Jefferson Coolidge Professor of Computer Science and Applied Mathematics at Harvard University. He has been awarded a Guggenheim Fellowship, the Knuth Prize, and the Turing Award, and he is a member of the National Academy of Sciences as well as a Fellow of the Royal Society and the American Association for the Advancement of Science. He is the pioneer of "Probably Approximately Correct" learning, which he wrote about in a book of the same name.

0:00:00.3 Sean Carroll: Hello everyone. Welcome to the Mindscape Podcast. I'm your host, Sean Carroll. One of the problems with reality, as I see it, is that there's a bunch of puzzles, questions, problems, if you like, that are hard to solve. And I'm not even thinking about, moral or political or social problems. I mean, just mathematical problems or at least problems that you can state rigorously and quantitatively. There are problems that in principle, you can find an algorithm for providing a solution, but that algorithm is so inefficient it would take forever or very long time to actually do it. This is, of course, a whole area of knowledge, right? Within theoretical computer science, we have computational complexity theory asking the question, if you have some kind of question like what is the shortest distance that a traveling salesman would have to go through given these stops that they have to make along the route, how many steps would your computer program need in order to solve it?

0:01:01.5 SC: And one of the nice things about those problems is that even though it might take a lot of steps to solve it, at least there's a solution, right? At least there is one right answer. The difficulties become enormously larger when you imagine being a scientist, let's say being a theorist within science, right? Where you have some data collected by your experimentalist friends. And your job as a theorist is to come up with a best fit model to the data to come up with a theory that best fits the data in the sense that you can extrapolate it beyond the data you have and have a good chance of continuing to fit it. Well, that's going to be a difficult problem to even conceptualize because maybe you don't have enough data to uniquely fix the thing. Maybe you have lots of data, but you don't have any good ideas about how to fit it.

0:01:52.6 SC: That's why we theoretical physicists make the big bucks, right? Anyway, today's guest, Leslie Valiant, has done incredibly important work within theoretical computer science on exactly understanding this kind of puzzle. The idea of getting better at fitting some data in a way that can be extrapolated the best learning that an automated system can do. I have to read you a little excerpt in case you don't know who Leslie Valiant is from his Wikipedia page. Valiant was awarded the Turing Award in 2010, having been described by the Association for Computing Machines as a heroic figure in theoretical computer science and a role model for his courage and creativity in addressing some of the deepest unsolved problems in science, in particular for his striking combination of depth and breadth. So he's written a book that is named after an idea. He had called, he wrote, this is a while ago.

0:02:52.7 SC: He wrote this book called, Probably Approximately Correct Nature's Algorithms for Learning and Prospering in a Complex World. And as I say in the podcast, I just love this phrase, Probably Approximately correct. The idea being that you have some guess as to how to extrapolate from your experience to what's going to happen next. There is literally no way that you can guarantee ahead of time that you're going to be right in that extrapolation. You can't even guarantee that you will be approximately right. You're going for that, you're trying to be approximately correct, but you have a chance of being approximately correct and what you can show in certain very rigorous circumstances that you can probably be approximately correct. That you have a good chance of doing that. And that's what all of us scientists actually aim for. So one of the great things about Leslie Valiant is that he thinks about these very deep ideas in rigorous, quantitative, theoretical computer science, and then does try to apply them more broadly.

0:03:56.7 SC: Thus, the subtitle, nature's Algorithms for Learning and Prospering in a Complex World. So he has a new book that is just about to come out called The Importance of Being Educable, A New Theory of Human Uniqueness. So what he wants to do is encourage everyone to shift their perspective from thinking about human beings on a scale of intelligence, how smart you are, how many things you know to a scale of educability. How good are you at learning new things? He tries to make the argument that knowing things, being good at solving puzzles or whatever it is, is much less important than being able to learn how to solve puzzles, how to understand things. And that's also something that you can quantify and you can actually think about it for machines as well as for human beings. So he wants to argue that what really makes us different than other animals is that we can be educated, which is a little bit different than learning, right?

0:04:53.5 SC: A cat or a dog, well, maybe not cats, but dogs certainly can learn to do things, but they generally learn to do things by being shown what they are wanted to do. And then, trial and error. You can't read a book if you're a dog and learn to do things that way. Whereas human beings can transfer new information, can generalize, it, can string it together in ways that according to Valiant, are not there elsewhere in the animal kingdom. So that may or may not be right. I think that's a good empirical question. He is very honest about when he puts forward a conjecture, and it still hasn't been tested yet, but it's a nice way of thinking slightly differently about our capacities and capabilities. So with that, let's go Leslie Valiant, welcome to the Mindscape Podcast.

0:05:58.3 Leslie Valiant: Thank you for having me.

0:06:00.8 SC: So if I look over your works, your research over the years and writings, the word learning is the one that appears over and over again. So you're a computer scientist. Why is it that the word learning became so important to how you think about things?

0:06:19.2 LV: Yes. So I started in computer science looking at computational complexity, which is about the inherent difficulty of doing computation. And I did various things in that field for about 10 years. But I regarded it as a very fundamental field of computer science. Maybe the most fundamental and the most basic idea, which occurred to me is that if it's no good at solving the basic problems of the mind and AI, then it's no good. So it was a real challenge for the field. And, so then I looked at AI, and I saw that there were various topics discussed in AI conferences, and I quickly tried to figure out, well, which one is the most fundamental? And asked that way it seemed to me that it must be learning. So I zeroed in very fast on learning and tried to understand it from the point of view of it being a computational process where which wasn't so easy. There were limitations and the limitations, some were statistical, but I thought the main ones were computational. It just needs a lot of computation to do it well.

0:07:27.7 SC: And has that, to skip ahead a little bit, were you right?

0:07:34.7 LV: Well, I think I was the, these big LLMs do spend millions of dollars in being trained. So that's exactly what the what the model was that you have to have a model of learning, which somehow promised you that the amount of computation needed may be much, but it was still polynomial time. It was still doable.

0:07:58.4 SC: And there's a...

0:08:00.4 LV: So I think it was right.

0:08:01.0 SC: This is an honest question. I don't know the answer. There's machine learning, which is a specific approach, a specific set of algorithms, but then there's the more broad category of machines learning. Are those two separate things? Is machine learning a general term or a more specific one?

0:08:19.9 LV: Well, these things are used in many senses. For me, they're the same, I think machine learning, it's the name of an academic field, which includes various things, but clearly it tries to capture what happens when machines learn. It should, should be the same.

0:08:37.4 SC: So it goes beyond neural networks.

0:08:40.8 LV: Sure, so neural networks are just one of many algorithms for doing learning. And from the '80's onwards, and maybe even before, people were exploring a whole array of different algorithms. And different algorithms are used for different contexts. So in some contexts where you've got enormous data sets, neural networks have been shown to be very good sometimes. But still there are many other algorithms which are widely used for smaller data sets.

0:09:07.6 SC: Could you kind of put us in the mindset of how people were thinking in the 1980s? I have this vague feeling there was super excitement about AI in maybe the '60s, and then it cools off a bit in the '70s, and it begins to return again in the '80s.

0:09:24.5 LV: Yes. I can only speak for myself, of course. So I came into, learning from a theoretical perspective. So from what I was doing in this theoretical field called computational learning theory grew up, which focused very much on formulating, learning from examples. So maybe one of the big services done by that community. So I formulated this, this PAC model, the Probably Approximately Correct model of learning. And basically what it captures theoretically is how you generalize is when you learn from examples when do you declare yourself successful. So clearly you have to predict future examples accurately, as accurately as as you can. But also you want to show that the effort made to predict future examples, to be in the position, to be able to predict future examples should be doable, moderate in terms of polynomial time and things like that.

0:10:31.0 LV: So, and so in particular, the more effort you make, the more accurate your prediction should be. But furthermore the more accurate you, the more effort should be reasonably rewarded. So the rewards for the more effort should go up not infinitesimally slowly. And this boils down to technically to algebraic curve, to exponent, constant exponent, the error being a constant exponent of the effort you made.

0:11:06.5 SC: I see.

0:11:06.9 LV: So anyway, okay. So anyway, this was a theoretical model. It was widely explored, and I think it focused attention on just this very specialized problem of learning, from examples, efficiently. And then soon afterwards, for reasons partly inspired by this, partly for other reasons, people started to have data sets which they shared, and they systematically compared the efficacy of different algorithms, however they did on different data sets. And this started from the mid-80s, and they grew this very large experimental machine learning community. So for a long time the neural nets weren't competitive because they just performed badly on small datasets. But, as we know, when the datasets became bigger, they became more and more competitive. So, in the machine learning front, there was lots of excitement because theoretically it kind of worked. I experimentally it worked on many data sets. And anyway, to me it was a plausible entry into what the mind did. It was something which was computationally feasible because from a theoretical perspective, the previous things people tried to formalize, like logic reasoning, everything turned out to be computationally intractable. So from a theoretical point of view, this was exciting for the community which did this. But they grew this large experimental community, which from which the current successes all derive. I think.

0:12:43.5 SC: I do want to mention that the phrase probably approximately correct or PAC learning as you're calling it, and there's also a book that you wrote with that title, that is one of the best titles I've ever seen for a book. And I presume that, is it allowed to take lessons that are implicit in that phrase and go beyond computers and specific learning algorithms to sort of a more general epistemological goal, We should all have to probably be approximately correct?

0:13:21.3 LV: Sure, you're given their license to do exactly that. But I think the phrase, although it's intended as a technical one, indeed it should be more widely used. So in fact the sense in which I think it should be broadly understood, is that when people try to understand AI now where the successes of AI are basically just exactly this learning phenomenon, I think the important point is that the phenomenon is that there's some promise that averaged over many cases your predictions will be good, but over any single case the promise is very weak every single case it's very the promise is weak. So certainly just knowing this phrase, you should immediately realize that when you try to use AI for some safety critical application, you should be very careful because in your particular case, the promises are very weak.

0:14:20.8 SC: And it seems though like a, almost like a simple philosophy of science, that science is not logic. It's not something where we deduce with absolute certainty a result. We have some examples, like you said, we have hypotheses based on those, and our aspiration should be to be approximately correct as often as we can.

0:14:41.4 LV: Yes, exactly. So actually in this earlier book I do indulge in some philosophical speculation. And I think the distinction I make there is that some tasks we do are theory full and some are theory less. So by theory full, something where we have a good theory, for example, like physics, where we have a theory, we believe we know what's going on and our predictions follow this theory. But most of what we do in everyday life is kind of theory less. We don't know the rules. When we make up nice flowing sentences, we don't know exactly how to do it. But the point is that nevertheless, we may have some high skill in being able to do this, and we get this skill from this learning process of many examples exactly as a ChatGPT, it can make continuous sentences but really flowing prose. But we know we can't characterize what that means. So, exactly. So the take I have is that much of what we do is theory less, but it doesn't mean that it's not, effective or predictable on the average or useful, because we just, so this learning processes is in some sense, robust and useful.

0:16:00.9 SC: It makes me think about, speaking of philosophy of David Hume and his worries about induction, saying that you can't know anything with certainty, even if every single example you've seen is that way. And in some sense, it seems like in at least this computer science context, you can formalize the fact that despite the fact that anything could happen, you have reason to believe that probably you have a model that is going to give you a pretty high probability of getting it right.

0:16:30.5 LV: Well, exactly. So in fact, when you were asking about the 1980s, so when I was getting into this field I was reading the philosophy and what the philosophers called the problem of induction.

0:16:43.5 SC: Yeah.

0:16:43.5 LV: Which goes, which probably goes back to the Greeks, and I think the philosophers have kind of their efforts kind of faded out because they didn't quite know what to do with it. And I do think that computer science has solved it. They solved it in the sense that it's given one meaning in which is understandable. Of course philosophers sometimes use the word very generally, so it's more generally. But I think computer science has given a meaning to this which in the domain in which it's meant it's kind solved. Yes, I think philosophy is important here.

0:17:20.6 SC: And you said this already, but I do wanna highlight it, the question of the efficiency of the calculation and the computational complexity, because philosophers, again, and I sometimes masquerade as a philosopher myself, but sometimes they imagine that you're Laplace's demon and you have infinite calculational capacity. But in the real world, it matters if something takes N steps or N squared or E to the n steps. And as I understand it, one of the nice things about PAC learning probably approximately correct, is that you can show that it is doable. It's efficient in some quantifiable sense.

0:18:00.8 LV: Sure. So I think it's a description of some real world phenomenon. And it does explain that some things we can learn effortlessly, like children learn the meaning of words in their language and fairly reliably. So that's something easy to learn somehow, although we don't know why, but there are also things which are hard to learn. So figuring out the physical laws of the universe just by looking at the stars isn't so obvious. Somehow people had to work very hard at working that out. So not everything is easy to learn, but obviously we live off things which are easy to learn.

0:18:37.0 SC: And so there are some problems that are sort of intractable, I mean, you're the computer scientist here, do we have a clean division into problems that are efficiently solvable and problems that look pretty intractable to us?

0:18:52.5 LV: Well in competition that's field of complexity. But even if you're restricted to learning so what's easy to learn and how to learn? Well, the obvious thing to say is that cryptography is really the flip side of learning. So in cryptography, you encode messages and someone listening in on many of your encode messages. You don't want them to be able to decode your future messages. Okay. So cryptographic functions have to be things which are hard to learn. That's their design. And so those are examples...

0:19:30.4 SC: I see.

0:19:31.3 LV: Of hard-to-learn functions, things we believe are hard to learn, and we actually use them every day, and it's important that they be hard to learn. Yeah. So the spectrum of easy-to-learn and hard-to-learn, at least the extremes are pretty clear. We use the easy ones, we use the hard ones for cryptography, and then there's something in the middle which we don't understand so well.

0:19:51.3 SC: And I guess that's actually very useful because it highlights the sense in which you're using the word learning. It's not just memorizing information, getting more knowledge. That is not what you mean by learning.

0:20:04.5 LV: Yes. So in this sense of learning, you need many examples, and you generalize from many examples. You abstract something from many examples so that in the future you can act, you can classify a new example. So you get labeled examples, pictures of elephants or not elephants, labeled as elephants or not elephants. Future someone gives you a picture, and you have to label it as an elephant or not. So.

0:20:29.3 SC: So you are trying to invent a rule that you will then test maybe against future data. In practice, how constrained is that generalizing process? Do real computer scientists imagine that there is a predetermined set of possible rules that we're sifting through, or are the computers really using their imagination somehow?

0:20:53.3 LV: Well, they're using a learning algorithm. So it depends. You choose what learning algorithm you want to use. So if you use a deep neural net, then the rule will be a deep neural net, which you learn. Or if you want to do something simpler, like a decision tree, it will be a decision tree. So it's up to the learning algorithm produces a rule from a class of rules, which they're able to.

0:21:19.9 SC: Right. Okay. And so this is stuff that you started thinking about and writing about in the '80s, I guess. And my impression, correct me if I'm wrong, is it's now kinda background knowledge for everyone who is doing AI. This PAC learning is a good way of thinking about what is going on right now.

0:21:38.3 LV: I believe so. I think it does capture the main phenomenon, which happens when you train a big learning algorithm. So for example the fact that it gets better and better with more data, and it gets better and better with some algebraic curve, that's what the experiments show. So. Yeah. So I think it is...

0:22:02.6 SC: I know there seems to be debate these days on scaling and how much it will affect the future intelligence abilities of AI models. There clearly has been enormous advances very recently with large language models, but they also still make mistakes. And am I right to think that there's like a camp that says, "All of those mistakes are gonna go away as we get more and more data," and another one that says, "No, some mistakes are kinda systematic?"

0:22:38.4 LV: Well, I'm not sure they should all say the same thing, that the mistakes will maybe go down, but the effort made to make them go down more and more will be bigger and bigger. So just following this approach, having one big learning box, we understand what it does, and we also understand its limitations. So in fact, from the '90s onwards, I was looking for ways in which we can build on this and do different architectures than just a single box.

0:23:09.9 SC: Okay. Could you give us...

0:23:11.7 LV: So I think the future is still learning, still using the same phenomenon of learning boxes, but probably in a system you'd have many boxes, and you'd have some sort of reasoning capability, you'd have some way of chaining together the conclusions they've reached to simulate something more like our reasoning capability.

0:23:33.5 SC: So when you say you have many boxes, you don't just have many copies of the same kind of box, you have different kinds of learning approaches that are gonna collaborate with each other.

0:23:44.5 LV: Well, not necessarily. Maybe they're trained on different data, on different boxes.

0:23:47.7 SC: Ah. Okay.

0:23:48.3 LV: We'll be trained on different data to have different, like the different box may recognize different concepts or maybe recognize different words in the English language.

0:24:00.4 SC: And does this bear on the question of whether or not, let's stick with large language models because they're so in the news right now, whether these models understand things, whether they know things, or is it a different kinda concept we should be using?

0:24:16.7 LV: Well, okay. So my answer is that these large language models are clearly trained for one thing, which is predicting the next syllable, the next token, as they call it. And that's what they're trained for, they're very good at that, they're hard to beat. And any other attribute you give them is your intuition. And people seem to be very generous to large language models and intuiting all the things they intuit. But I think if you intuit other capabilities like reasoning or, I think it's very hard to prove those capabilities. And I don't think they've been proved certainly since such a large effort has been put into these large language models, people should explore what maybe by luck they're doing something else in addition.

0:25:13.4 LV: But we don't know. And I think it's quite likely that they're not. They're very good at very smooth prose, experts at figuring out the next thing. And obviously, the quality of any of these learning systems depends crucially on the data they're given. Everything depends on the data set. And certainly people who produce these large language models put an enormous effort into having a very good data set. Human trainers, this and that. Okay, so it wouldn't work well otherwise. So besides having the learning algorithms, the other secret is the data is trained on.

0:25:57.9 SC: Well, I'm pleased to hear you say that. It's very similar to things that I said in a recent solo podcast myself, where I tried to make the point that what is impressive to me about large language models is not that they are thinking like human beings are, that's not what they're trained to do and what we should expect them to do, but they can sound like they're thinking like human beings are. And that is a fascinating fact that maybe we should work to understand better.

0:26:24.3 LV: Yes. That's a small comment on a psychological comment that maybe our standards are low in and we're impressed by anyone who speaks smoothly and can change together five sentences coherently that that impresses us usually. But maybe it's easier to do than we think if you're given a billion examples.

0:26:47.1 SC: So just in this AI world, just for a while before we we're gonna generalize in a bit. But are there clear paths besides getting more and more data to pushing the algorithms in the direction of cognition or reasoning or thought or something like that? Can we do the same kind of thing, but with a different twist to make it more like what we recognize as thinking?

0:27:13.2 LV: Well, okay, so now you're going into an area where opinions differ.

0:27:18.0 SC: Good.

0:27:20.2 LV: So clearly in AI, initially there was this big effort in just basing it on reasoning. So people try to use classical logic to do the reasoning. So that wasn't robust enough. It was hard to put in knowledge, and it'd be robust. So the problem with using classical logic and somehow marrying it with machine learning is that it's kind of, the assumptions are rather different. Classical logic is very deterministic, unforgiving to errors unforgiving to inconsistencies. Whereas machine learning is very kind of forgiving to errors, inconsistencies, and everything else. So the approach I've been trying to pursue is to impose on, is to marry learning and reasoning, but distort the reasoning process to be compatible with learning. So you need some reasoning, which forgives errors and things are...

0:28:22.1 SC: I see.

0:28:23.3 LV: Correct under a certain probability. If A implies B and B implies C, then A implies C would only be true with high probability.

0:28:28.7 SC: Right.

0:28:28.9 LV: The assumptions are true, that kind of thing. So anyway, so I think that's the right way in which AI could go if we care more about reasoning. So for example, if at the moment, even if you are impressed with large language models, you probably wouldn't wake up in the morning and decide what to do depending on what your large language model recommends to you. You wouldn't do that. But if you wanted to make them a bit more reliable, a bit more authoritative, then I think that's the kind of thing you'd have to do.

0:29:04.2 LV: You'd have to make systems, which conform to what we can commensurate think of is as reasoning. And it is possible to combine is to do certain kinds of reasoning on knowledge which is learned. So I think it's a possible avenue. It's likely that it'll be done for more limited domains of knowledge initially. So the fact that large language models can talk about everything, is impressive. But if you really wanted to reason, then probably computationally would become totally prohibitive to in person on that. But there are ways for AI to progress where some reasoning capabilities are integrated with the learning.

0:29:55.0 SC: I was recently, I found myself on a panel discussion on AI in physics, and there's clearly very obvious applications when you have these huge data sets that physics gets from particle physics or cosmology or whatever, and you want to use a model to find features in those data sets. That's an obvious application. But as a theoretical physicist who wants to invent new concepts to describe the world, that's harder for us. And a friend of mine, fellow physicist, Jesse Fowler from MIT, suggested that what we need is large language models, but not in the space of tokens, but in the space of concepts that they can sort of string together concepts in new ways and be creative, theoretical physicists that way. Is that a pipe dream or is that something like a hot research project these days?

0:30:50.2 LV: Well, I think, so I'm not quite sure why he talked about large language models. I mean, if you're saying machine learning, machine learning on concepts in theoretical physics, I understand. It doesn't have to be this string like thing of large language models. I don't think it has to be presented as text and sentences and paragraphs.

0:31:11.7 SC: Sure.

0:31:12.0 LV: So certainly using machine learning to predict some new law of physics from lots of data. So in principle, that's something 1 11 one can pursue. I don't know how successful it's gonna be, but it depends very much on how you represent the knowledge. So yeah. Yes. So I think it's something to pursue for sure.

0:31:42.7 SC: Okay. We've gotten the AI maybe out of our system. We'll come back to it, I'm sure. But I do love the fact that in your books, you are willing to go beyond computer science and draw larger lessons and think about wider problems, and in a very sort of legitimate way and in the sense that like, maybe this work, here's a hypothesis, let's go test it kind of thing. So I guess one inroad here is does nature use probably approximately correct learning, or is that a way of thinking about information processing in the natural world, not just in computers or brains?

0:32:24.5 LV: So by, you mean nature, beyond computers and beyond brains as well?

0:32:29.6 SC: Yes. I'm thinking of biology and evolution and life and things like that. Early life.

0:32:33.3 LV: Well I think so. So I've pursued that. So certainly in Darwin evolution, and I've worked on that as well. So in fact, the book you referred to earlier, probably approximately, correct. I do have a chapter on that. So one can formulate Darwin evolution as a learning process, where basically the labeling, there's no one who labels elephants and not elephants, but the labeling is survival. Okay. So, different things happen and some survive, some not. And that's the feedback from the world. Yeah. So in my opinion, this is a phenomenon of evolution, and I think it's a basic phenomenon of our cognition. So in fact, I think this, the basic question you're asking is that going from computer science to natural world, okay, what am I doing here? Yeah, no. So I think I'm really going the other way that...

0:33:39.2 SC: Okay.

0:33:39.4 LV: So as a general way of looking at it, so the idea that, what humans can do, can also done by machines is something which Alan Turing already discussed, and for good scientific reasons, Turing thesis said, "Yes, everything humans do, machines can do also." But I think what held back AI for a long time is that we couldn't identify what we actually did. What was it that we have to simulate by machine? Okay. So it's a lack of understanding of ourselves, and then we tried, we reason, we tried logic, it didn't work so well, but it's learning from examples which humans do, so that's worked out better.

0:34:25.5 LV: There's a theory there, and also in practice, it works. That's what everyone uses. So, generally computer science, I think everything we ask computers to do, we got the idea from humans, okay, we computers do all the ideas come from humans having done it already in some form. And so I think in pushing AI forward, we really have to understand ourselves better. So what is it that we do? Okay. So we learn from examples. What else do we do? So trying to put that down on paper, I think will help us understand ourselves and give us something to simulate, something to the goals for computers.

0:35:10.4 SC: Yeah. I...

0:35:10.9 LV: So I think the two go very naturally together. I think humans and...

0:35:14.9 SC: Yes. That does make sense. But I like this connection to Darwin and natural selection, because I don't really think I've thought of it this way. Like when you say the word learning, I think about going to school, listening to lectures, doing homework, things that happen in our brains. But you're saying if you conceptualize learning as coming up with the model, with the algorithm that generalizes from the inputs it's had, and your goal is reproductive fitness. Then what nature is doing it, whether you like it or not, is tuning the genome to solve this problem by this approximately correct kind of paradigm.

0:35:58.0 LV: Yes. Yeah. So the learning algorithm there is the kind of the mutation algorithm and whatever goes into that. And it learns from the feedback is survival, and it's trying to fit the world to be good to fit the world.

0:36:15.5 SC: And does it fit into our understanding of the efficiency? Is Darwinian evolution a good algorithm for solving that problem? Or is it kind of sloppy?

0:36:25.0 LV: Well, so the problem is that Darwin didn't tell us the details. He didn't. And we still don't know the details that, so one theory is that the mutations in the genome are a uniform random noise. Okay. But we know it's not exactly that, but we don't know whether it's somehow a clever version.

0:36:49.0 SC: Right.

0:36:49.5 LV: Clever variant of that which helps evolution. Yeah. So in my opinion answer must be yes, that this must be what nature is doing. But even if kind of a pencil and paper explanation of what's going on that if you do a mutation, how much does it change the expression level of your genes. And to mutate from this to that, you know, what's needed, how many examples do you need? That kind of thing, so that hasn't been, more work needs to be done there.

0:37:29.2 SC: Yeah.

0:37:29.6 LV: But I think the scope, even for a pencil and paper, plausibility analysis, that evolution can work as fast as it has, because I don't think there's any scientific explanation of why evolution as succeeded as fast as it has. That's still unsolved science, I think.

0:37:48.1 SC: Right. Good. There's lots of unsolved problems, but.

0:37:50.0 LV: Yes.

0:37:51.5 SC: Okay. We're gonna fearlessly leap ahead to human beings, human beings are, I think our audience is all gonna agree. We can learn, we have that capacity. And in your new book, you have a new book out called The Importance of Being Educable, A New Theory of Human Uniqueness, and you're kind of shifting from learning, which is a fact to educability, which may be there's a gradient or a spectrum of abilities there.

0:38:24.6 LV: Well, yes. So, yeah, so I build on learning. So I assume we can learn. That's something which a theory which has worked well. So, and as I said, I regard learning as a capability, which you can define as fairly precisely, and I think humans do it. Machines do it well. And so basic question I ask is, well, what else do we do which is fundamental? And now I think the way the question is asked in the book, which I think, yields the answer is, so in the course of evolution, at some point humans emerged at some point we became capable of doing various things, which, like I said, civilization, and it happened fairly suddenly. So maybe the changes were kind of sudden, weren't that great. And the evenings, maybe in some composite capability, we suddenly came together. But what capability do we have?

0:39:28.6 LV: Which can't account for our civilization. And of course similar questions have been asked ever since Darwin by any number of people and the usual problems are that if you pick one feature, there's no feature which is totally unique to humans. Everything you define, some animal can do. And so this is what the book is about. So this is educability I define. So it's composed of three basic things. One is learning from experience, which is exactly the same as learning from examples, PAC learning. So once you can do that, it seems natural even for low animals who can do that though animals 100 million years ago, could have already adapted. If the food was towards the light, they would go to the food. But once you do that, then you should be able to also chain together what you've learned in different contexts.

0:40:25.5 LV: And this is what I alluded to this, what I hope computer systems would do more is some training of different things you've learned. And that's a very basic kind of reasoning. And so here the question is, how can you put that also on some principle basis that if you change together two pieces of knowledge, why are you so sure that you don't get nonsense? Okay. So maybe they're inconsistent in some hidden way, you've learned them different contexts. So once you go away from classical logic, the guarantees of chaining things together get lost. So you need some basis for for training. And then the third aspect of educability is very human one of not having to learn from experience, or change for experience, but being able to just take from someone else their experience.

0:41:19.8 SC: Yeah.

0:41:22.0 LV: So they tell you how they do things, they tell you their theory of physics, they tell you the theory of politics, and you just put it into your brain and then you can apply the next minute. So this is taking theories explicitly given.

0:41:37.7 SC: Right. So the difference is we can train a dog to do tricks, but basically, it's learning, but from examples. Like here's, you get rewarded when you do this, you don't get rewarded when you do that. As far as I know, there are no cases where I can just speak English to tell the dog to do something that is never experienced before and it goes to and does it so that this is your candidate for something that really makes human beings different.

0:42:05.5 LV: Well, I think even that's difficult to make. I think the candidate is the integration of these three things.

0:42:11.8 SC: Okay. Fair enough.

0:42:12.8 LV: On top of each other. I mean, there are cases that you can fall for apes, you can sometimes by physical demonstration, give them a complicated task to do to retrieve food from some tube, et cetera et cetera. So it's almost like giving instructions, which they can remember and repeat. Okay. So again, it's being able to give lengthy instructions once and have an animal repeated, again, isn't totally unique to humans. But we're certainly much better at this. We sit in lecture rooms and listen to podcasts and take stuff in which animals don't.

0:42:52.7 SC: And so the integration of these three things, and the two things are easy to say, one, is you're learning from experience, the other, is you're being taught. The conjoining or combining different models is a little bit more slippery and in my brain, maybe you can help flesh it out a bit. Do we need a symbolic representation of two different theories in order to ask if they're consistent or is it more implicit somehow?

0:43:20.7 LV: No. This is simple in some sense. So it's like, if you think of your mind's eye. So when you plan something, you plan how to go to dinner this evening. So you imagine a situation, use some knowledge to predict what will happen if you do X. And then once X happened, then you get a new situation. And then you use some new knowledge to tell you what you should do next. So planning is an example where in your mind, you chain together pieces of knowledge and you run the the world forward. And also with reasoning, very simple reasoning that something happens, you have some knowledge what's gonna happen next, there's some other knowledge tells you what's going to happen next. So this chaining is something we do all the time is for planning reasoning. And this is what we need. This seems an essential part of being able to operate this. It's not just like having a single neural net. And be applying the single neural net and saying, "Yes, excellent."

0:44:21.1 SC: But it's related to a hypothesis that Adam Bully talked about on my podcast earlier. And he and others have been their psychologists thinking about mental time travel, the ability to put yourself into a future situation, hypothetically, and I just think of it as imagination, but it's very close to what you're saying. It's sort of working out a set of things that haven't actually happened. But you're able to have the capacity to say it would happen, given my theories.

0:44:54.6 LV: Yes, because we can take in other people's theories. And we almost don't care whether they're true or not. We don't know whether they're true or not. But we can do this mental processing with ease. So you can watch a movie, whether it's fact or fiction, you almost don't care. But you can process it. We've got this great facility to understand what's going on, to draw implications of what's going on. And whether it's fact or some fantasy or some future thing which hasn't happened, we don't even care.

0:45:28.1 SC: Right. Good. And so this net capacity to do these three things, this is educability?

0:45:37.7 LV: Yes. In my view. That's my definition of educability.

0:45:42.0 SC: Sure. And one of your mottos or slogans is that educability is perhaps more important than intelligence, which we tend to talk about all the time.

0:45:51.8 LV: Sure. So I think the main downside of intelligence is that no one can define it. That it's, of course, people have complained that we give importance to intelligence, we test people for intelligence tests, and this has consequences. And we don't even know what we're testing for, where the questions come from. So I think it's very unfortunate that the notion of intelligence has become so important because it's not explicitly defined. So I've explicitly defined educability, intelligence has no such definition. In fact the early Sir Charles Spearman I think who first tried to deal with IQ notion of IQ using cystic he defined this as an implicit statistical notion. So basically I think for him general intelligence was, came from a study of how children in schools the different subjects and it turned out that the children who were good at one thing were likely to good at something else as well. That performance in different subjects in school was correlated and then he almost defined intelligence to be some sort of the core of a correlation but it's very implicit and we still don't know what the definition is. So the main downside of intelligence is that we don't know what it is and when people try to define it they say it comes in 10 different varieties and people disagree. So I think it's, I think we should go and look for more useful notions.

[laughter]

0:47:38.3 SC: Well I guess that's the next question, if it's nice to be able to define educability, but that falls short of convincing us that it is the important feature that has enabled civilization as you talk about in the book.

0:47:55.2 LV: Well it's got some of the important components. So certainly people discuss how humans are unique as far as our culture. We've got enormous culture we can hand culture on so easily. One person in a lecture room or someone can write a book in a million, if they read the book get it the next day. And also no such access to have access to no such phenomenon. So certainly the spread of human culture is certainly based on being able to transfer explicit theories. So that part is essential and it's and it's parts of this computational process. Yeah, sure. So if for any model it's value depends on how useful it turns out to be. Okay. So that we don't know yet. But it's a candidate.

0:48:50.6 SC: Well when would you or could we pinpoint a moment of historical time when we say, "Oh okay, human beings have figured out how to educate themselves," or is it more gradual?

0:49:04.9 LV: You mean in the past?

0:49:05.9 SC: Yeah.

0:49:06.1 LV: You mean or?

0:49:07.2 SC: Yeah historically was it hunter-gatherers or did it come along with agriculture?

0:49:12.9 LV: Well okay so when we got this capability, okay. So I don't need to speculate that this but I can. Okay. But I think this this could have come even before humans. So I think it's possible that we had them 300,000 years ago some predecessor species already evolve this and it just took a very long time for it to become useful to us. It's like a snowballing effect that you need more and more knowledge to share among humans before it becomes useful. So there's no evidence of any mutation having spread through the whole human population in the last tens of thousands of of years. So there's no biological explanation of a new capability in the recent times of agriculture and things like that. So my guess is that it's the capability is much much earlier.

0:50:11.0 SC: But I guess, you're a college professor as am I, you've had students the capacity to be educated differs from person to person. It's not just humans have it, it's a skill or a capacity that we can improve on. We can make better we can make bigger.

0:50:32.4 LV: Well, yeah. So in almost any aspect of life in individuals different in performance if you give them some test. So in the book I do raise this question of whether educability can be enhanced some process of... And I don't know the answer to that. So I concentrate on the fact that this educability I think is something common to all humans. But we may have different levels of it and whether it can be enhanced, I don't know. Certainly if the concept has any meaning then it should be measurable. There should be some way of saying yes, we have so much of it. But, and I suggest that research could be done along these lines where people try to test educability. And so the main feature of that in some one sense it's very simple is that, when you practice your educability you're gaining new information. So if you have a one hour educability test you should not be testing for anything which you knew beforehand. You should avoid any skill. You shouldn't be testing for kind of skills which native skills you should be testing for things, for information you've got in that one hour. So it's slightly different from what people do currently.

0:52:02.1 SC: Of course. Yes. It's the opposite of what people do currently.

0:52:07.3 LV: Yes, exactly.

0:52:08.2 SC: But also, it's very hard because maybe you're giving somebody some new information asking how quickly they can learn it or new information asking how quickly they can generalize it. But it suffers from the same worries that large language models do. Like how do you know that you're not tainted? How do you know that this person hasn't thought about similar things before?

0:52:33.0 LV: Yes, for sure. It's difficult to do, I suppose you just have the questions, would have to be so designed that there's a likelihood that, so maybe there's some artificiality in the questions, or the subject matter is so obscure, you don't expect the person to know about it. Yes, that's the difficulty of the design. But that's what education is. Right?

0:52:54.6 SC: Yeah.

0:52:55.0 LV: You're trying to impact... And you were saying. So if you want to measure that, then somehow you have to solve that problem.

0:53:00.7 SC: And you at least asked the rhetorical question, should we care about this property of educability, for example, in the leaders that we choose? So do you think that we should undergo a shift from valorizing intelligence to valorizing educability more?

0:53:20.2 LV: I suppose if only because I still don't know what people mean by intelligence. Okay. So I think that's the question. But in leaders, I think this makes a lot of sense because as we see, when people become leaders, they come in with so much knowledge, but they need... There's new things happen, and we really want them to be able to use the new things where it's not enough that when they come into their leadership position, they know everything worth knowing. New things happen and they want to be able to pick it up and use it.

0:54:02.5 SC: I guess, there is a current worry about things like misinformation, or just the fact that we get so much input from the world, from the internet and sifting through it and paying attention to things becomes a crucial skill. Is that something that we get new insights on by thinking about educability as a central concept?

0:54:24.7 LV: Well, the insight I got from thinking about this, which I hadn't appreciated fully before, is that, when I describe this educability model, it seems a very powerful way in which we can, learn information and process it. But somehow there's nothing in it which is good at testing whether the information we've received, especially in the third mode, if someone tells us a theory, there's nothing in my model, which gives us the capability to check out whether this new theory is correct.

0:55:01.9 LV: Okay. So we're very easy prey to false theories to conspiracy theories. If someone tells us their political theory, how are we to know whether it's good for us or not. We're not sure, maybe we related to our previous knowledge, but. I think we're very bad at evaluating theories other people give us because I think it's just inherently difficult to do. I think there's no way it can be done. So now that there's this deluge of information, I think this weakness of human weakness may become more and more dangerous for us. Because, although I think the important point for me is that this weakness is enough than humans, and we have to deal with this at the human level. So it's not just a question that, deep fakes and there's new tricks for fooling us. We could have been fooled by older methods as well. And somehow we have to recognize that we're easily fooled and educated to understand it. And it's not just a question of the new technology. I think we have to deal with the inherent weakness.

0:56:12.2 SC: I did have a couple of recent conversations that made the point about the social aspects of how human beings learn and pass knowledge down. We are more trusting of other human beings than other species are of their fellow beings. And that's enabled us to sort of learn faster because we have teachers that we trust and believe and things like that. And maybe there's a dark side of that, which makes us a little bit too willing to accept what certain favored people say.

0:56:46.4 LV: Yes. So, certainly public education is based entirely on trusting the teacher. If you go to physics course, you have to trust every... There's no way the student can verify everything they've learned. So...

0:56:56.3 SC: No.

0:56:57.3 LV: You have to trust. But still, I think we're not totally, trusting. So psychologists do do experiments where they compare, if children are given information by two different people, and children do have preferences for who to believe, do they believe their parents or someone who seems to know more about the subject. So we are born with some strategies for dealing with this complicated world, but they're obviously, the strategies aren't that powerful. But the idea is who to trust is critical. Yes.

0:57:33.3 SC: Yeah. Does, again, this lens of educability, does it suggest better ways to educate people or to have a school system to focus learning? Have we been, maybe the answer is no, but have we been led astray by thinking too much about intelligence and not enough about educability?

0:57:55.6 LV: Well, I think more research could be done. So I don't know the answer to your question, but I think this formulation does certainly ask new questions. The most obvious thing to say, I think is that somehow people have talked about the science of education, but still, I don't think that education is pursued from a very scientific point of view. It's very much a best practices kind of thing. And there are very little... Well, there's some background science, which you could build an education system on top of, but it's not used that much. So I think there's a lot of scope for putting more of a scientific basis for developing more of a scientific basis for education by further research. Whereas I think this notion of intelligence, I don't think has provided us with very much.

0:58:54.9 SC: By scientific basis, are you thinking of an empirical testing of what works and what doesn't, or a theoretical superstructure to explain why certain things work?

0:59:05.8 LV: Well, I think it's kind of both, that in the end you do have to test whether something works, but to get some assurance that this thing, this working transfers to other situations. Often experiments are done in one situation, some new education technique works, but it doesn't really transfer. So I think some, as you say, some superstructure, theoretical superstructure would be useful to help us understand why there's some chance of some approach working. So, I think it's a combination.

0:59:34.8 SC: Maybe an easier question, not very easy, but an easier one than reforming the school system is just reforming our individual selves. Is it possible for a person to improve their own educability, to learn how to be educated better? Is that just a matter of becoming a better thinker, a better scientist, or is it more to it than that?

0:59:58.0 LV: Well, I don't know. I think that that should be a subject of research. But if one can measure educability, then you can ask the question more rigorously, that if you could do some educability test and see whether something you do for yourself improves your educability. So if you don't measure it, then you're not clear what it means again. So again, so I don't know, but I think it does raise questions, which I think seem to be new.

1:00:33.4 SC: Well, you have a... I was, as usual in the podcast preparation, I can skim through the book of the person who's coming on, but I can't read every word. You have a chapter entitled Education as a Model, educability, I think, as a model of education. Could you explain what that means? That sounds very interesting and important.

1:00:52.7 LV: Yeah. Okay. So part of the book is a justification of this mixing of competition and human behavior. So, I think this is kind of a philosophical question of where's the science in computer science? Okay. So like in Turing had this Turing machine. So what's the science? Is it just a mathematical definition and that's the end of it or is there something more? And what I try to explain is that with computational models, the Turing machine is clearly the best example. There is a kind of a new kind of way of trying to get to knowledge where you have a definition which has some properties of robustness that, for example, if you make a variance of it... It's very good if it captures the same notion still that if you define some notion of computability, you don't want the mean to change if you change some little part of the definition.

1:02:01.2 SC: Yeah.

1:02:01.2 LV: You want it to be robust. So there's some such characterizations. So on this chapter, I try to suggest that educability satisfies some of these notions that one should have some confidence that this model has some robustness that if you try to express similar things tomorrow, you probably get a model which is kind of the same or similar. Okay. So it's trying to justify why this is a scientific approach. Okay. So some people would say the only scientific approach is to do experiments. So, if I'm talking about anything about humans, in the title, I should do experiments on humans, but I'm not doing experiments on humans. So, what am I doing? So that's what I'm trying to answer.

1:02:49.7 SC: Good. And maybe that sort of brings us full circle to the AI goings on these days. It's been a lot of excitement over the last year or so. We've asked, is it important for leaders to be educable? Can we improve our own educability? What does it mean for a computer to be educable? Is it exactly the same meaning? Is it, these three things that you listed? And if so, can we or should we or are we aimed at giving computers those capacities?

1:03:27.0 LV: Yeah, well, I suppose the point of my book is that the aims should be about the same. But I think what, again, having written this down, what one realizes is that the difficulty of being educable is that someone has to decide the content of the education. Okay. So just being educable doesn't produce useful human beings if what they're educated with is just bad stuff. Okay.

1:03:54.6 SC: Okay.

1:03:55.3 LV: So, and same with computers. So the difficulty is with the current pure learning systems, obviously, what's the training set. But if you make them educable, then there are even more decisions you have to make about what knowledge you give them. Because depending on what knowledge you give them, the results will be different. So being educable has its great dangers as well. You can educate machines very easily and also humans very easily to do things you don't want done.

1:04:28.6 SC: Well, near the end of the podcast, I always encourage the guests to let our hair down a little bit. You just touched on these looming issues that a lot of people are super concerned about, about AI risks, whether literally existential risks to the planet and the species or, smaller scale political, social risks. You're someone who has made enormous contributions to helping make machines seem intelligent. Are you worried that it's going to get out of control? Or is that something that we'll just keep monitoring and tweak along the way?

1:05:08.4 LV: Well, I think the one thing I would say is that the most extreme fears of this singularity, I don't see or support because these arguments for singularity are usually based on the fact that... Of some sort of mysticism, that machines will become super intelligent in a way we don't understand and they'll take over. So my view is more that what machines will do, they'll be more capable along lines we understand, learning, reasoning, taking our theories. But at least they'll do things, they'll do processes we understand. And these are the same processes we do, I believe. So, we've got some scientific basis of understanding what they do. So then AI is just like any other kind of dangerous science, like chemistry, where bad things can happen. But if you understand enough, and we know what we know, we know what we don't know, we can kind of have some control of what goes wrong.

1:06:15.3 LV: So something like, when new, in the pharmaceutical industry, if some new drugs are released, they're very thorough tests, one can't predict ahead of time, whether they'll succeed or not. One can't predict whether the tests will really be totally foolproof, maybe mistakes are made. But AI is kind of a similar kind of thing, that we understand enough, we'll be cautious, we'll do common sense precautions. But the idea that somehow they'll take over without us letting them take over, I think is a misplaced fear.

1:06:54.9 SC: Good. So putting aside that the misplaced fear, what is your and this will be the final question, what do you think will be the biggest impact on human lives from the fact that, that AI is making such advances?

1:07:08.2 LV: Well, I think we'll just, well, it's going to be more like a mixed economy where some things are done by machines, some by humans. And we'll just have to get used to that. And how it will evolve in detail, of course, no one knows. Computers will get into all aspects of our lives. But we'll just have to get used to the idea that many of the things we do, computers will do, and we shouldn't be upset by that.

1:07:40.3 SC: We shouldn't be upset by that. That's always good advice.

[laughter]

1:07:43.7 SC: To control what we control and live with what we can't. So, Leslie Valiant, thanks so much for being on the Mindscape Podcast.

1:07:49.3 LV: Okay. Thank you.

2 thoughts on “272 | Leslie Valiant on Learning and Educability in Computers and People”

  1. Just a thought: The honey bee waggle dance (see: Social signal learning of the waggle dance in honey bees, Dong et al, Science 2023) employs a complex language with identifialbe structure and is manifest in different dialects. And what is more, appears to qualify as a nascent example of educability; exhibiting at least the first and third elements of Prof. Valiant’s definition of educability. As for the second element, chaining together of learned elements (or exhibiting planning?), perhaps a question that might be put to Prof. Dong.

    Thanks for the interesting discussion and perspective.

  2. Can computers become intelligent eventually? What is intelligence anyway? What are the abilities of an intelligent entity? Here are a few: To have experiences and be capable of reflecting on those experiences.
    To have insight into the meaning of observations. To understand what it is doing. To be surprised by its findings.

Comments are closed.

Scroll to Top