To say that event A causes event B is to not only make a claim about our actual world, but about other possible worlds -- in worlds where A didn't happen but everything else was the same, B would not have happened. This leads to an obvious difficulty if we want to infer causes from sets of data -- we generally only have data about the actual world. Happily, there are ways around this difficulty, and the study of causal relations is of central importance in modern social science and artificial intelligence research. Judea Pearl has been the leader of the "causal revolution," and we talk about what that means and what questions remain unanswered.
Support Mindscape on Patreon.
Judea Pearl received a Ph.D. in electrical engineering from the Polytechnic Institute of Brooklyn. He is currently a professor of computer science and statistics and director of the Cognitive Systems Laboratory at UCLA. He is a founding editor of the Journal of Causal Inference. Among his awards are the Lakatos Award in the philosophy of science, The Allen Newell Award from the Association for Computing Machinery, the Benjamin Franklin Medal, the Rumelhart Prize from the Cognitive Science Society, the ACM Turing Award, and the Grenander Prize from the American Mathematical Society. He is the co-author (with Dana MacKenzie) of The Book of Why: The New Science of Cause and Effect.
0:00:00.0 Sean Carroll: Hello, everyone, and welcome to the Mindscape Podcast. I'm your host, Sean Carroll. As we go through life, one of the things that we're inevitably going to do, all the time, is to assign credit or blame for things that happen in the world, either to people or to other events that are happening. We have effects, things that happen, and we have the causes for those effects, the reasons why those things happen. This idea of a structure of reality based on causes and effects and their relationships is perfectly obvious. I mean, it's something that is completely evident to us ever since we were little kids. The ancients talked about it. Aristotle famously kind of organized a whole categorization of causes and different kinds of causes and their effects, but like many such ideas, when you think about it a little bit more deeply, it becomes tricky. What exactly is going on? If I say, "I got sick because of a virus."
0:00:54.8 SC: What do I really mean? There's a kind of simple answer, which is, if it weren't for the virus, I wouldn't have gotten sick, if the virus weren't there. But you try to implement that in some systematic way, and what you find is it's much trickier than that. For example, what if you had gotten the virus, but you were also vaccinated, so therefore you were protected against it, or what if you didn't get the virus, but you got something else, so you got sick anyway. Furthermore, that kind of reasoning isn't limited to just the virus. I mean, Darwin's Theory of Evolution is responsible for viruses in the first place, in some sense, so did you get sick because of Darwin's Theory of Evolution? Did you get sick because space-time is four-dimensional, without which maybe there wouldn't be such a thing as viruses? It's hard to pin down exactly what's going on.
0:01:39.2 SC: And this difficulty is not just for physicists or philosophers or other kind of scientists, it's becoming increasingly important in artificial intelligence research, because computers don't have this immediate, obvious feeling that there are causes and effects in the world like human beings do. So we really do need to get at the guts of what's going on, when we talk about causes and effects. And modern scientists and philosophers and mathematicians and computer scientists have done this. No one has been more influential than today's guest, Judea Pearl. He's done foundational research on understanding what is meant by causes and effects, and he's written about it, he has a popular book, from a few years ago, 2018, with Dana Mackenzie, called The Book of Why: The New Science of Cause and Effect. So you can read about it there, but you can also get it here on this podcast.
0:02:25.8 SC: We're gonna talk about exactly this set of questions. Just to give you a little bit of a hint, the idea is we think about probabilities of things happening. And even if there's definite things that happen, even if it's not about randomness, maybe you don't know what's happening. So for a statistician, if you say, "There are people, a set of things called people, and some of those people drink alcohol and some teetotal, some don't drink alcohol". So what that will mean is, there's a fraction of the people who drink and that's saying that the probability that a randomly selected person will drink is a known quantity, so there's a probability involved, even if everything is perfectly deterministic. And then also, for people, there's a probability that they own a cat or own a dog or have no pets at all, or both, or whatever, and then you can ask questions about, "Well, okay, given that you drink or you don't drink, what is more likely, do you own a cat, a dog or no pet at all?"
0:03:15.0 SC: And then you can say, "What causes what? Are people who drink more likely to own pets?" Or, "If you own a cat, does that force you to drink?" That's the kind of question that this new science of causality is designed to answer. I'm not gonna give away all the ways that it happens, but again, crucially important for Computer Science and Artificial Intelligence, also for areas like medicine. You wanna know what medical intervention is giving some effect in the patients. For politics or economics, what policy changes lead to what effects. This idea of cause and effect and getting it right pervades how we think about the world. As a physicist, of course, there's a whole other dialogue to have about the fact that in Newton's Laws of Motion, there are no causes and effects, so how do you recover them at the macroscopic level? We get into that a little bit and many more interesting ideas, so let's go.
[music]
0:04:24.0 SC: Judea Pearl, welcome to the Mindscape Podcast.
0:04:26.9 Judea Pearl: Glad to be here.
0:04:28.3 SC: So, causality is one of my favorite topics to think about, both as a human being and as an academic researcher, so this is gonna be a great thrill for me to talk to the world's master. Let's try to get on the table how the typical person out there should think about causality. We're both right now in Los Angeles, as we're recording this, we'll hear people say things like, "I was late because there was a traffic jam on the 405." Attributing to the fact that they are late, a cause, namely, there was a traffic jam on the 405. So I guess the first question is, does that make sense? Are these good ways of thinking? Is that causal language useful?
0:05:09.2 JP: Oh, that's the best way, because that's the way people talk. And to distinguish my profession or my hobby from yours, I'm interested in capturing the way people think and not the way nature is constructed. So, this is because I am in the circle of AI people, and we have a certain mission, we wanna capture how you and I think, so that a robot can communicate with us in a natural way, regardless of how the molecules move.
0:05:47.6 SC: And I think it's a great fact that AI helps us understand, not only the world, but how we think, 'cause we take so many things for granted and the computers don't.
0:05:57.9 JP: Absolutely, and that is a real test. And that's why I'm accused, many time, that I didn't pay attention to the great philosophers, to what Kant said and Hegel and Aristotle. Well, [chuckle] they didn't have a pleasure to build a robot that behaves like us, and they didn't have a metaphor of thinking, nothing for... Yeah.
0:06:25.3 SC: So, explain what more do you mean by that, the metaphor of thinking.
0:06:29.1 JP: Well, the cat had a metaphor, we have gears in our mind and they turn and that's what makes us deduce one thing from another. Okay?
0:06:41.4 SC: Right.
0:06:42.0 JP: But, why? He needed to have a metaphor because he was familiar with gears, he wasn't familiar with neurons, he wasn't familiar with even logic circuits. He had only one metaphor for deductive machine, or at least a machine that has output based on input, and that was the gears system, that [0:07:09.1] ____ maybe invented, right? So, he put it together, once you have a... I call it a laboratory or a playground, you have to have a playground for your own ideas, so you can take them apart and try different combinations. Philosophers did not have a playground for ideas about thinking.
0:07:33.2 SC: So they...
0:07:33.7 JP: And we have.
0:07:34.5 SC: We have, the computers are forcing us.
0:07:39.1 JP: And that is why I don't feel I can learn much from philosophers.
0:07:43.8 SC: Good, perfectly fair. Now, I will mention Aristotle once and not because I learn a lot from his theories of causation, but because he did sort of try to divide up different kinds of causes, and I think he went too far into things that we don't even call causes. But let me just distinguish between the kind of thing I said, I was late because there was a traffic jam versus something like, why is the sky blue? Well, it's blue because short wave lengths of light scatter off of air preferentially, but that's not an event, right? The traffic jam is an event in space time, the properties of the air are properties that are more or less permanent, are those the same kind of cause effect relationships from your point of view? Or do you distinguish between those?
0:08:31.1 JP: We distinguish between them.
0:08:32.6 SC: Okay.
0:08:33.5 JP: One is called the actual cause, and I think a philosopher called it token. Token versus a, I forgot the other one, token...
0:08:47.9 SC: Type.
0:08:48.9 JP: One is based on variable, and the other one is based on event.
0:08:53.4 SC: Right.
0:08:53.9 JP: If I say, I was late because of the traffic jam, I'm talking about one specific event in one specific time, and one individual that's me and in one situation, that's token. And this, one event caused another one, and the contrast to that is a variable-based causation, that careless driving cause accidents, okay? Which means the variable, the driving type, okay? Which has many values depending on how you drive and then tends to cause a higher risk of accidents. The philosopher used the example of drinking Hemlock caused death or Socrates died because he drank this Hemlock. There's a difference that singular, singular versus token. That's how they use it, yeah.
0:10:01.0 SC: Right, okay. And, do you feel...
0:10:02.7 JP: So, we have a different name for them and we had different algorithms for identify each one.
0:10:08.3 SC: Good. And, do you think that causality is something that is fundamental in nature, or is it something that is helpful to we human beings to describe what's going on in nature? Is it more emergent or is it built in to the fabric of reality?
0:10:27.1 JP: There's nothing, there's no causing effect in physics, as you know it, because all the equation of physics are symmetrical in time. So, and they're built around Algebra. Algebra is tied to a connective called equality and equality is symmetric. So, if F = MA, then A = F/M. Physics doesn't distinguish between the two. However, as an emergent property, we perceive certain things as directional, we say that a rooster crow does not cause the sunrise, but the other way around, even though it occurs earlier and is highly, highly correlated.
0:11:20.4 SC: I will be forced sometimes to ask questions that I think I know the answer to, just because it will help the audience a lot, so don't be surprised.
0:11:29.8 JP: No, no. I like those because it give me a chance to repeat myself.
0:11:33.7 SC: That's good. We'll both be doing that, okay. I like this distinction that you're drawing, and I just wanna emphasize how very profound it is. There is this giant revolution with Newton and Descartes and Galileo in constructing mathematical theories of physics. And you're right, the most commonly appearing symbol in a mathematical equation, set of mathematical equations is the equal sign. And there's no arrow on it, and so in some sense you're doing something absolutely audacious, or at least going back to a previous time when these audacious things were common place, where you're saying, I wanna know not just equal signs, but arrows, which is the cause and which is the effect.
0:12:17.0 JP: Yeah, correct. I'm glad that you share with me the astonishment or the... But, what Galileo did that he chose Algebra and say nature speaks Algebra, which wasn't clear at the time, and it enabled him to do so many things that weren't done before, just 50 years after the invention of algebra, by [0:12:44.3] ____ and other, right? Okay. I think the students today, should appreciate this revolution. And take analogy now that drop parallel to what Sewall Wright did. Sewall Wright is a geneticist, in the 1920 that got sick and tired of working with equation and said, "It doesn't represent what I want. I want to have an assignment operator. I put an arrow." He didn't think about the assignment, but we in computer science know that there is a difference between assignment and equality. I take the content of register A and assign it to be the content of register B, okay? It's a different operator, which it's asymmetric. And he was the first to put a symbol for his, the symbol was an arrow and he built these path diagrams, which everybody attacked him for that, but at least he said it represent what he wants. And before... Okay. That's why I admire him for having the audacity to put a new symbol for something that he understood is needed.
0:14:08.0 SC: And I'm trying to understand. Alright, we're done with the questions that I think I know the answers to. Now, we're already moving on to questions I don't know the answers to. What is the relationship of this new way of thinking, which again it's an old way of thinking, right? Aristotle would've been perfectly happy with these arrows but we sort of got rid of them and we're bringing them back. What is the relationship between this and the idea of counterfactuals or possible worlds? A very simple minded guess as to what causes are, is that A causes B if had A not happened, B would not have happened. And so, already you're talking about a whole different universe where different things happened.
0:14:49.3 JP: Right. And that was a glitch of Hume. He used almost this term had A not happened B would not have... Yeah. What the relation is, that we have a calculus of counterfactual, very simple. Which means, we take path diagram the way that Sewall Wright put them on. And we can define what a counterfactual is for every two variables of every two events we can assign truth value to every counter factor you can can think of based on the path diagram. So, that where we have a calculus, plus we have an understanding of what you need to build those path diagram. And it's built on one relationship: Listen to. Everything is built on knowledge. You have to combine data with knowledge. So somebody has to build those path diagram. What do you need to think of when you decide whether to put an arrow or not to put an arrow between A and B? That is the question. What comes to your mind? And the only primitive we need is the primitive of 'listens to' okay?
0:16:08.3 SC: Listens to.
0:16:09.9 JP: The barometer deflection listens to the atmospheric pressure. The rooster listens to the glow in the sky. That's the only relationship, it's a very primitive one. And it's a very natural one because I believe this is the most rudimentary. If you not ask for more rudimentary and simple, the relationship for scientist or for a robot to think about.
0:16:38.3 SC: The barometer example is a very good one. We think that, the barometer is telling us the atmospheric pressure and if all we had was the data, if all we knew was what the pressure was and what the barometer reading was, we're back in Galileo land, and there's an equal sign and there's no arrow going either way. And what you're saying is there's something extra that that is missing, it's not just a correlation. There is a clear fact of the matter that the pressure is causing the barometer, not the other way around. And that's all we would like to understand.
0:17:11.0 JP: Correct, yeah. And we only come aware of it when we have to program on a stupid robot. Because the stupid robot, if it looked for the equation, will try to move the barometer and hope to prevent the rain tomorrow.
0:17:27.3 SC: Right, good. And let me also just give the audience a visual here because you and I have in mind, these diagrams you've already referred to Sewall Wrights diagrams and you and your collaborators have built these into a wonderful tool. So, these diagrams represent what? Let me let you say what they are.
0:17:45.3 JP: They're collection of judgment about who listens to whom. I'll put an arrow between the barometer and the atmospheric pressure. If I think the barometer listens to the atmospheric pressure and I would not put an arrow there if I think that the barometer... Between the barometer deflection and the price of beans in China tomorrow.
[laughter]
0:18:13.6 SC: Okay.
0:18:14.3 JP: Even though there could be indirect connection between the two.
0:18:18.8 SC: Good. So we have in mind, a bunch of facts or a bunch of things in the world that could take on different values, right? The barometer could have different readings, the pressure could be.
0:18:29.8 JP: We call them variables.
0:18:30.5 SC: Yeah. Variables. Right. Good.
0:18:31.8 JP: But it's a man made, variable is a men made entity.
0:18:36.0 SC: And then we put all of these variables in circles and we draw arrows connecting ones where we think that one thing listens to another thing. Okay. And I do have one philosophy question about the counterfactuals. We wanna say, if A hadn't happened then B wouldn't happen. I mean, all the robot knows is the world. All it knows is the data, right? What gives us the license to talk about what would have happened in a different world, where all we have is what did happen in our world?
0:19:09.5 JP: Okay. Assuming that you are are willing to make assumptions in the form of who listens to whom. When you feed the robot, this collection of assumptions in terms of a path diagram, and that's enough from now on the robot can reason counterfactually, because all the knowledge about counterfactual is contained in that diagram.
0:19:36.0 SC: So, the game is to use things we observe about the world to construct this kind of diagram, telling us what listens to what, and then we can deduce what would happen counterfactually.
0:19:49.4 JP: Absolutely. That's the whole point. Yeah.
0:19:51.8 SC: Good.
0:19:52.5 JP: So, this is a parsimonious representation of counterfactual, of super exponentially large number of counterfactual.
0:20:04.7 SC: Sure. Sure.
0:20:05.4 JP: Yeah. And that's something by the way, if you compare it to what philosophers did in about closest world semantics.
0:20:16.1 SC: Yeah.
0:20:16.6 JP: Like David Lewis, right? He had the idea that A is counterfactual related to B, if B is true in all the world, which are closest to A and something like that. Right. So, as a philosopher, he didn't care about computer representation or mental representation. How would you ever write down, how does a mind represent all the infinite relationship between who is closer, which world is closer to which world given another world. Okay. This is enormously large set. Okay. But we, computer science cannot deal with super exponential storage. We have, we must... Two thing, first of all, as psychologist, we must agree that we have to face the problem of representation. If you have a theory and the theory does not allow for parsimonious representation, scrap up the theory. It cannot work in our mind, right? So, and the other consideration is practical. We cannot feed the robot super exponentially large memory.
0:21:36.1 SC: That's a very good point. And, I guess as a physicist/philosopher, are guilty of thinking like Lewis sometimes and just letting ourselves imagine arbitrarily complicated, different situations but you're making the point that if what we care about is teaching a robot, our understanding had better be simple in the sense that there's only a tiny number of rules, and choices that needed to be implemented to say something about what happens next.
0:22:06.2 JP: And something else to enforce it and to... Is the fact that we form a consensus. We human beings, society.
0:22:15.9 SC: Yeah.
0:22:16.5 JP: We do form a consensus about counterfactual. How is it possible? If each of one had a different notion of which world is closest to which. Okay. We wouldn't form a consensus.
0:22:31.2 SC: Yeah. And it makes me wonder, okay. So, why can we do that? Much like why can we talk a language of causality at all, if it's not to be found in physics. And I think that ultimately physics is where our description of the world bottoms out. Are there special features of physics that allow us to talk about these ideas at the higher emergent levels?
0:22:58.4 JP: I'm not sure I understand the question. Yes. We have to ask why do we... What it is about the mechanism and the motion of the molecules that makes us believe that the barometer is a result of the atmospheric pressure and not the other way around. Okay. Now, Simon has said... Herbert Simon of human... He had some conjectures about, he said it's probably consideration of power. Okay, and energy, the sun doesn't care about the rooster. Okay? And then because of the enormous difference in maths, in energy involved here. So, we rather give the sun the power of influencing the rooster and not the other way around. So, we have all kind of... And time progression is also important, right? Okay. So both time and energy and mass gives us the clues about directionality.
0:24:01.4 SC: Good, good. Okay. That is very helpful. But now let's roll up our sleeves a little bit and get into the nitty gritty of how this works. So we have a diagram, we imagine a diagram between all sorts of different variables and what listens to what, but just because B listens to A doesn't necessarily mean that A is the cause of what's happening in B. It's more subtle than that, right?
0:24:24.2 JP: Oh yeah. Because B is listening to many other things. Some of them are influenced by A negatively. So yeah, it's a combination of listening, for which we have algorithm that unpack them and give you answers to questions on three level of the reasoning hierarchy and that is the organization in which I like to think.
0:24:52.6 SC: Okay.
0:24:53.0 JP: Is the ability to answer questions of a certain type.
0:24:56.2 SC: So sorry, what are the three levels of the reasoning hierarchy? That sounds important. [laughter]
0:25:00.7 JP: I thought everybody knows by today, okay.
[laughter]
0:25:03.2 SC: Well, they should know by your book but let's assume they haven't yet.
[laughter]
0:25:10.3 JP: Okay. The lowest level is statistics. I hope you don't have many statisticians among your audience because they get so extremely insulted when you put them on the lowest level.
0:25:20.6 SC: Lowest level, I know.
0:25:23.6 JP: Yeah. It is just when you look at one event, then you infer the likelihood that another one occurs, will occur or had occurred. It doesn't matter yeah.
0:25:36.2 SC: Sure.
0:25:36.7 JP: But it's still the association between events, this is correlation, this is statistics, this is, by the way, also machine learning unfortunately, okay? Or at least 99% of machine learning. So, this is what we get and they're on the lowest level. For that, we don't even need a diagram, we can do that. Actually, we need it to for Parsimonious representation of conditional probabilities, we need a the diagram too, but it's a purely probabilistic diagram. We call it Bayesian network in my language, okay? So, that's the level one. Level two is action. What if I do that? And the reason it's totally different than the first one, because you're talking about changing the probability space, you're talking about the new environment... Things have changed in the world. I don't wait for the sprinkler to be turned on, I apply my muscle and make sure that the sprinkler is on. So looking at the sprinkler, if the sprinkler is on, I can infer it must be summer. But if I turn this printer on, I can no longer infer that it must be summer, right?
0:27:01.5 SC: Yeah.
0:27:01.5 JP: Okay, so that is a simple example of why it changes the world. And so the world of intervention, this is something that we can learn from randomized experiments, and that was the greatness of Fisher, Ronald Fisher to introduce randomized experiment into statistics, statistics before Fisher was all about correlation. And that changed the practice of statistics, not the philosophy, because Fisher refused to talk counterfactuals, so he couldn't... He couldn't prove that his randomized experiment give you what you really want, you see, and what the farmer really wanted in his station in the agricultural environment in England was to know whether to plant use fertilizer A or fertilizer B and what will be the yield if I do one or the other, the farmer did not care about randomization, but Fisher convinced the community that if you randomize, you get rid of all the other factors, and what you have is an answer to the farmer.
0:28:24.7 JP: He couldn't prove it, Neyman could prove it. Neyman at the time but he... Fisher left home with Neyman and he refused to use his notations. So, without the language Mathematics, he was able to convince the community that randomized experiment gives you answer to your question, which fertilizer should I use. But now that we are going to the third level and this is counterfactual or understanding or explanation, unit level, retrospection and the imagination, it's the highest level of reasoning that I can think of, perhaps I'm missing the fourth level but yeah... But this is what we mean by explanation why things happened the way and was it the... Was it the aspirin that removed my headache or other factors? Or as you mentioned, it's event-based, it has to do with the individual in a situation, in a particular situation, one event happened and another one happened, was the first one the cause of the other one and that is not even in... You cannot answer that in a randomized experiment.
0:29:49.5 SC: Right, so I guess the understanding, I'm getting confused between the second level of action and the third level of counterfactuals... Why aren't counterfactuals in the second level, I mean the action is, if we did this, what would happen, yeah?
0:30:05.7 JP: In the future, right? There's no contradiction.
0:30:10.5 SC: Good.
0:30:11.5 JP: There's no contradiction about me going to take an aspirin, now, I want to know if my headache will go away. But if I say I did take the aspirin, my headache did go away. What if I didn't? Now I have a contradiction between what was actually observed event that occurred and one that I hypothesize. What if I didn't take the aspirin? Yeah.
0:30:36.8 SC: Good. So now I do understand, so the second level is moving forward in time, and if I do this, what will happen next? Whereas the third level lets us go, had I not done something, how would things be different?
0:30:48.0 JP: Right. Yeah. It's undoing events that took place.
0:30:50.5 SC: Good, okay.
0:30:51.5 JP: Yeah.
0:30:53.5 SC: Let's look at one of the classic examples, which is always used in these discussions, which is does smoking cause cancer? This was we know the answer, but back in the '60s, there was a debate and one of the ideas was that there were... There might be something else, some genetic effect or something like that, that could cause cancer, and the question is, how can you tell?
0:31:12.0 JP: I mean the answer is you can't unless you make some... No you can't, unless you make an experiment.
0:31:19.8 SC: Sure.
0:31:20.5 JP: You could, but it's unethical and probably practically impossible to force a guy to smoke two packs a day, even though he's not inclined to do that.
[laughter]
0:31:37.0 SC: Probably bad.
[laughter]
0:31:39.5 JP: Yeah. So it's hard to do, although conceptually is doable, but it's hard... It cannot answer the question without randomized experiment, but at least we have one technique to answer it, randomized experiment, yeah?
0:31:52.2 SC: Yeah.
0:31:52.5 JP: You seen... Okay, good. At the time, given that we cannot run the randomized experiment, the answer... It developed into a fierce discussion of controversy, an argument between the pro-tobacco and the anti-tobacco, that you [0:32:14.6] ____ and then, Fisher was a heavy smoker...
[laughter]
0:32:19.2 JP: And he argued for... [chuckle] He argued, you cannot rule out the possibility that there is a genetic effector, it makes people crave for nicotine and put you into a cancer risk. And indeed, we cannot rule it out, except we can bring to bear some knowledge about plausibility in the world. And the way it was resolved is by thinking, how strong should that tobacco gene be, in order to account for the observation? It turned out it had to be quite strong, it had to have the strength that the presence or non-presence of that gene would make you eight times more likely to smoke than not smoke. And that was just implausible on the basis of what? The whole legal battle there, was resolved by pinning to plausibility.
0:33:25.0 SC: Wow, okay.
0:33:26.1 JP: Yeah.
0:33:27.0 SC: But let's put aside ethics and human beings and what we're allowed to do and things like that, and just wonder intellectually about this question, because it's just a paradigm for other kinds of questions, I mean, the two possibilities are smoking causes cancer or some genetic factor causes both, smoking and cancer. And if I understand the move that you and your collaborators want to make, it's to say the difference is that if we force you to smoke, you will probably get cancer, [chuckle] and therefore it doesn't matter.
0:34:02.1 JP: Yeah. The difference... No. The difference is shown in the model very nicely. You have a model and the difference will be in predicting the effect of action. If I force you to smoke, I can predict the likelihood of cancer, if I force you to evade smoking, to refrain from smoking, then I can predict the likelihood of cancer under that circumstances. It's a... However, there's another element here, if you believe in... If you bring to bear knowledge in a form of a diagram, then I can do more than that, and I can say, "Perhaps you should adjust for gender or adjust for family history or adjust for addiction in the family." to prevent that kind of thing. And so the diagram also tells you what factors you must adjust for, in order to get an answer without experiment. So now we are talking about replacing the experiments which many times cannot be done by a piece of knowledge, that the diagram, that I am content of piece of knowledge, which is the collection of who listens to whom, and it tells you now, what factors they should adjust for, if you want to get the answer, replacing that randomized experiment.
0:35:37.9 SC: Good. And at the level of this new calculus that you wanna talk about rather than just equal signs. We have arrows now, you invented an operation in this world of diagrams, called The do-operator, that sort of implement...
0:35:51.0 JP: The do-operator.
0:35:51.4 SC: This idea. So tell us what a do-operator is, has a D-O, as in, "I'm gonna do it."
0:35:55.5 JP: Well, it's very simple. The do-operator, just simulate on the diagram what an action will be in the real world. For instance, if I want to say... I turn the sprinkler on, that you can simulate on the diagram, previous right now, the sprinkler is enslaved to the climate, to the season, 'cause I connected it to an automatic controller. If I do sprinkler on, I subject the sprinkler to a new master, that's my muscle, and I dislodge it from the influence of all previous influencers. I can do it on a diagram. I'll remove the arrows from its previous masters and I subject it to a new master, which is my muscle, on the diagram, and I set the value of the splinter to on, boolean 1, Bink. I have a new diagram, I can solve it. And that the do-operator, is simply simulation of action.
0:37:07.0 SC: I remember when Simon DeDeo... I don't know if you know Simon, but he was a previous guest on the podcast, and he mentioned in conversation that something that he wanted to do was an implementation of Judea Pearl's do-operator. And I had never heard of this concept before, but instantly, I had the impression this was something very, very, very important. [chuckle] So I ran out, learned about it, and it's clear that it's gonna be crucial to how we're thinking about artificial intelligence and complicated science questions going forward.
0:37:40.2 JP: I'm very happy with the do-operator.
[chuckle]
0:37:43.5 JP: Yeah. But it's so simple.
0:37:45.5 SC: It's so simple. That's good. That's not bad.
0:37:47.2 JP: All these statisticians get irritated by the do-operator.
[laughter]
0:37:52.0 JP: You know why? Because they realized that they should have invented it, 500 years before Pearl and they didn't.
0:38:05.0 SC: Well, it's simple in implementation. But there's something subtle here. Let me just sort of say it my own words to see if I'm right. Another example is exactly the same structures as what you've been saying. But my favorite example is windshield wipers on cars being on, and people having their umbrellas up, right? In the data, whenever people have their umbrellas up, the windshield wipers are going on cars. And when they're not, they're not. And so there's a correlation there in the data. And this is why you say the statisticians, this is what they do. They go, "Look, there's a correlation". But what you're saying is you can implement in your diagram, do umbrella.
[chuckle]
0:38:47.6 SC: So, walk outside on a sunny day, and put up your umbrella and see if all the windshield wipers start moving on cars. And they don't. And that's the sort of physical implementation of what you can do in a Bayesian diagram.
0:39:00.9 JP: But this will not convince a statistician.
[laughter]
0:39:05.1 JP: I'll tell you why. Because the do-operator doesn't exist in probability theory. Okay?
0:39:12.7 SC: Yeah.
0:39:13.2 JP: Okay. So, it not to an operator to improve it. Where does it exist on the diagram? And where does the diagram come from? It's a piece of knowledge that is brought to where? Outside the data.
0:39:27.1 SC: Exactly. No, I think that...
0:39:28.8 JP: And that is what statistician resist. We don't put a new opinion.
[laughter]
0:39:35.4 JP: That was it. That was the manifesto of the Royal's Statistical Society in 1833. We are not gonna publish anything which has to do with opinion. Data, data and only data.
0:39:54.5 SC: Right.
0:39:55.4 JP: Yes.
0:39:56.0 SC: But it's not a matter of opinion. I think I'm totally on your side here. But it is not in the data either, right? It's counterfactual data. It's the question if I walk outside and put on my umbrella, what would happen? And so you need to go, beyond the data, but you can by doing experiment, right?
0:40:17.0 JP: Yes. One way is by doing experiment. But as far as the windshield vital, yes, you can do it by experiment.
0:40:24.9 SC: Right. It's a better example than smoking for that reason. We don't have to hurt anybody [chuckle] by making them smoke.
0:40:30.7 JP: Yeah. Statistician will be happy with it. Do experiment.
0:40:34.8 SC: Explain a little bit how this helps us tackle classic conundrums of causality like the firing squad, right? When you have a firing squad where there's 10 people shooting at a death penalty victim, and one bullet hits first, and you say that causes the person to die. But if your definition of cause was, had that not happened, the effect wouldn't have happened, that's wrong because the next bullet would've come and hit them, right? So, how does this help us understand cases like that?
0:41:08.7 JP: That is the difference between necessary cause and sufficient cause. So we can compute on how sufficient was a rifleman a bullet, as opposed to rifleman 12 bullets. And that's why we put the squad there.
[chuckle]
0:41:30.1 JP: So no one could be blamed, as an individual. Blame is a necessary cause. We are saying if it wasn't for your bullet, the guy would be alive. So, the responsibility is divided here equally, not even equally. I think if you compute it including all kind of noises, for instance, the all kind of happy trigger guys and things like that, the probability will be minimal for each rifleman as a responsible necessary cause for the death. But at least we have a calculus to compute it.
0:42:16.8 SC: That's right.
0:42:17.4 JP: The degree to which your bullet, was a necessary cause for the death. And vice... Now we can also talk about sufficient cause, and compute that. And the combination of two plays a role in responsibility. It's still not part of standard code procedure, to compute the necessary... Not yet, not yet.
[chuckle]
0:42:44.3 JP: But I think when we are advancing now with code of AI, I think the legal profession will listen to us. Because they are dealing now with very critical issue of fairness. To what degree the algorithm was unfair to gender, to women or to minority groups in their request for loans and so forth. So, the idea of responsibility and sufficient and necessary cause, play a very critical role. They will have to listen to those philosophical definitions, but I call it computer science definitions.
0:43:35.0 SC: Sure.
0:43:35.5 JP: [laughter]] Yes. And listen to them and implement them in some procedure they already did because according to the court of law, the "but for" is a standard criterion. You don't pay compensation unless you apply the "but for" criterion. This is the victim would be alive but for the actions of the defendant.
0:44:07.7 SC: And is that... Sorry, is that compatible with your definition of causality or do you think...
0:44:10.0 JP: Oh, absolutely. But for has no meaning in the colloquial conversations among lawyers unless you put it in a firm scientific basis. And the algorithm is the definition of "but for". It's a necessary cause, the degree to which the action of the defendant is necessary for the death of the victim. And it has to be greater than 50% according to the court of law before the guy is declared guilty, or before he is forced to pay compensation.
0:44:55.1 SC: Okay, so let me just try to summarize 'cause I think that we didn't quite end up with the definition of causality yet, and maybe there isn't one, but I think it...
0:45:02.3 JP: No, no, no. I like it because... I like the fact that we didn't conclude that because it has different shades. You have direct cause, indirect cause, necessary cause, sufficient cause, necessary and sufficient. They come with all kinds. You want to quantify all those. So I'm glad we didn't come up with a single one.
0:45:24.8 SC: Well, it's not a single one but there is an insight which I think is crucial, and you've said it, but I just wanna say it again 'cause sometimes we say lots of things and it's hard to take away the message that at least I have this idea that you have this Bayesian network, this graph of all the probabilities of all these things that can happen, and then all these arrows from one thing to another. If one thing listens to it, I like that formulation. And then there's this extra statement that you need to go a little bit beyond the data you have to say if I put a do on one of the variables and just force it to do something without propagating backward in the graph, without saying why it's doing that, I'm just forcing it to do it, I'm going outside and putting up my umbrella, turning on my sprinkler, if doing that causes, if that leads to some effects down the chain, then that action is a cause of those effects. Yes?
0:46:21.6 JP: Correct. Yeah.
0:46:22.9 SC: Good. And one of the... And again, you said this again but I just wanna sort of rub it in 'cause it is so profound. For the purposes of learning and robots and AI, it's a lesson that the data or a comprehensive set of data might not be enough. You have to go play, you have to do some experiments to really learn why things are happening.
0:46:46.7 JP: And that's why you find toddlers and babies are constantly playing around in the crib, and they are not pacified and they're restless until they get the state of understanding why this toy makes noise and the toy doesn't make noise, okay. That is why this restlessness is the craving for, as tabled in the diagram.
0:47:18.9 SC: So, babies in the crib are drawing Bayesian diagrams?
0:47:20.0 JP: Well, no. I'm serious about it. Baby in the crib thriving to construct a causal diagram for the crib world.
0:47:32.2 SC: I like that. I'm just gonna sit in silence and contemplate that, 'cause they don't know it but that's what they're doing. Just like Euclid didn't know he was using the metric.
0:47:38.8 JP: No, no, no. I'm sorry. They're born with the craving.
0:47:41.9 SC: Okay, good.
0:47:45.9 JP: I'm serious about it. Which means it's explaining why babies remains restless, regardless if you reward them for the right action or not. They are reward neutral and they have this curiosity to find out how things work regardless of the payoff as opposed to monkeys and other animals, which are driven by reward and not by curiosity.
0:48:14.4 SC: Okay, so this raises a whole bunch of questions about at what point in evolution did we become motivated to just learn the causal network rather than just get rewards.
0:48:23.7 JP: Absolutely, yeah. I leave it to anthropologists. I leave it but I pose to them the question, in more concrete terms: At what point in evolution did this transition occur or what kind of computational facilities we acquired to enable us to do that kind of thing? I believe it was the invention of the counterfactual. And if you read the first chapter of The Book of Why, is the Harari hypothesis that the artist was able to construct things that have no physical reality. It could not happen in the physical reality, like the lion men. Head of a person with the body... Sorry, the body of a person with the head of a lion, put them together. This ability to construct things which do not exist in reality but exist in one's imagination, that was the key cognitive transition to overcome the evolution. It enabled the Homo sapiens to dominate the planet.
0:49:45.6 SC: So, I will mention something that maybe have not heard much earlier in evolution but it wouldn't be doing what you say, it's the first step toward doing what you say. Malcolm MacIver, who's a neuroscientist at Northwestern, was a previous guest on the show and he studies fish climbing onto land for the first time. Okay. And he makes the claim that if you're under water, you're swimming along at meters per second, but you can only see meters in front of you, and all of the evolutionary optimization is to react instantly to whatever you see, but when you're on land, now you can see forever now, you can see to the horizon and there's a new modality that opens up namely seeing something far away and contemplating different hypothetical responses to it. You have time to do that. And he makes predictions on the basis of this theory about the development of brains and bones and sensory organs in the fish that they climb on in, and I get it. I think that that's number... I don't know if it's true or not I don't have the qualifications, but in some sense that will be the birth of imagination, but it's only imagination within the template of what you already know is possible, but you could see how evolution would... In the long term, develop that up into something much more creative.
0:51:02.9 JP: Yeah I can see that. I haven't heard about this experiment... Interesting, I'm surprised that fish can see outside the water on land. [laughter]
0:51:12.9 SC: Well in fact, as soon as they start peaking up on to land, their eyes move they evolve so their eyes like you know become frog-like and peak up above their head instead of being on the side, so they can see better.
0:51:26.5 JP: Interesting.
0:51:27.3 SC: It is, it is. Okay, good. So babies constructing Bayesian network. But again, I wanted to re-ask a question that came in at the very beginning, but now we're more sophisticated, so we can ask it again, when we're drawing these arrows, we do it on the basis of data, even though it's supposed to be something that says more than the data are telling us, ideally, all we have is the data that we have to construct them. How objective is that? Can we write down a methodology for saying, Here's a bunch of data, therefore, here are the arrows you should draw, or is that completely coming in from our judgment or something like that?
0:52:11.6 JP: Easy answer it completely comes from our judgement, however our judgment has also evolved a gender contains condensation, compilation of ideas, tradition, knowledge that came to us by social evolution as well as by biological evolution. So our judgment is also based on stream of data that took billions of years to evolve to impact us. So in the controversy in data science is whether we should build an Einstein form an amoeba by simulating the stream of data that our ancestor and from the time that we were amoebas until they became Einstein which essentially what data science is today that's learned everything from data. Because that's all we have. Or the alternative is we already had our ancestors works for us and have compiled a bunch of knowledge that we call plausible, a plausibility of who listens to whom? Okay inevitably compiled and we know that the rooster... The sun does not listen to the rooster, let's use it.
0:53:49.2 JP: Both side have arguments. On their side, because after all, everything we know originally came some sense data, but the question is, can we afford to work zillions years to replicate it. And will we ever replicated the way we evolved? Because knowledge that we have is subject to incidents such as meteor rains and it's something we can not duplicate. Anyhow, that's a philosophical question. And I'm for using compiled knowledge that we already have. Also the argument is like that suppose you are successful in discovering the Cartographer graph from pure data, and by the way, there is a bunch of activity called the discovery, which is based also on the idea of what should the graph look like in order to be compatible with the data, let's rule out all the incompatible. I'm leaving it on the side now. It's criteria picked up momentum in the past few years, but aside from even if you are successful, to learn the causal graph of data, you have to learn it, how to use it.
0:55:21.0 JP: And that's why it's important to keep it in mind to know to use it and to remember, that you have to communicate with a human being, the end user who is enslaved to this structure, so you have to be compatible with the way the human being has structured his or her causal graph in order to build trust between the computer and the user.
0:55:50.0 SC: Now, I like that because I think that in a lot of philosophy of science, for example, people pretend that we should aspire to be some objective receiver of data and develop hypotheses, but in fact, we carry around with us models of the world from the start, the manifest image or whatever you want to call it. And I think we under emphasize the importance of that built-in Starting Point in reasoning.
0:56:17.4 JP: Absolutely, as I say in a lot of my chapter, the physicist write equations but they talk cause effect in the cafeteria.
0:56:28.7 SC: Very true, very true. Speaking of which, I do want to talk about physics a little bit, because here's how I say it sometimes. The question that you and your friends are trying to ask is, "Does smoking cause cancer or is there some other variable that causes both smoking and cancer?" You're not trying to answer the question, does cancer cause smoking? You've decided ahead of time the cancer doesn't cause smoking because like you said, it's heavier or whatever. We know from the structure of the world that it's plausible, that cancer listens to smoking and it's implausible the other way around. I wanna derive that though on the basis of the laws of physics and in particular, like you said that might be too ambitious as a general rule, but I do wanna derive the fact that the causes come before the effects in time. I wanna derive the fact that the arrows have to point toward the future. Are you optimistic that that is something that can be derived on the basis of our physical understanding of the world, or do you think it's just gonna have to be taken for granted?
0:57:30.8 JP: You want to derive the time directionality, from causes as opposed to the... Normally we think about, that causes must be constrained by the flow of time. By temporal precedence.
0:57:46.5 SC: No, sorry. I think I misspoke. I want to derive the fact that causes precede effects from the arrow of time. But by the arrow of time, I mean the increase of entropy since the Big Bang to today.
0:58:02.7 JP: That is definition of time?
0:58:04.5 SC: That's the arrow of time.
0:58:06.7 JP: The arrow of time is an increase of entropy?
0:58:08.8 SC: Yes. That's right.
0:58:12.0 JP: I can not help you.
0:58:12.8 SC: Okay good.
0:58:13.3 JP: I don't know whether... It's a nice challenge. I can maybe... If you convince me it's worth doing, I'll be happy to immense myself in that question.
0:58:28.6 SC: I need to write a paper here and I'm halfway done yeah.
0:58:31.4 JP: But I'm not there.
0:58:33.7 SC: Good. I mean will just...
0:58:35.3 JP: I know that temporal precedence constrains the direction of cause and effect, but it's not sufficient. It's not sufficient.
0:58:44.4 SC: Right. Well, I...
0:58:45.6 JP: As we saw for instance in the rooster crow, that it comes before the sunrise, but still we say it's not the cause.
0:58:55.0 SC: That's right, that's right. That I completely agree with. I'll just add one more claim, to that you can think about whether or not you think it's relevant or not, which is the following that if I know the state of the world microscopically right now... So what I mean by that is I don't know where every atom is, etcetera, but I know where the people are and where the planets are. I can use or even better, if I have a probability distribution over the macroscopic state of the world, I claim that from that plus the laws of physics, I can predict the probability distribution in the future. I can just use the laws of physics to move forward in time. But from that and the law of physics I can't retrodict the past, all by themselves. I need an extra assumption, which is the low entropy boundary condition near the Big Bang. And I think that's what's going to break the symmetry between going forward and going backward, that ability to predict the correct probability distribution of the world just based on current macroscopic probability data.
0:59:56.0 JP: Okay. Why do we go to microscopic where we cannot think too well? [chuckle] You're talking about the molecules, and there are already zillions of them. Let's talk about something which is simpler, okay? Let's talk about the Billiard ball, Billiard table, and you'll have this nice triangle of the balls sitting there and... What do you call this leading ball?
1:00:23.0 SC: The cue ball.
1:00:24.0 JP: What's the name?
1:00:26.6 SC: The cue ball.
1:00:28.0 JP: The cue, the cue. Okay. The cue ball comes, it hits the triangle, everybody disperse and come and hits the walls of the table. Now we're on it backward. We take a movie. We're on it backward. Okay? Can you tell with the bare eye, whether you'll run the movie forward or backward?
1:00:51.4 SC: Yes. [laughter]
1:00:53.4 JP: By what? There's no entropy here.
1:00:56.7 SC: Well that is initial...
1:00:58.4 JP: It's as simple F=ma for every billiard ball.
1:01:03.0 SC: True, but you chose to begin with a low entropy configuration in the triangle. Yeah?
1:01:08.1 JP: Why is a triangle low entropy, than the state of the balls, a second later, where each one of them bounces. Why is it low entropy?
1:01:20.5 SC: Well, you have some...
1:01:21.5 JP: I don't think it's...
1:01:22.3 SC: It is. It is coarse graining.
1:01:25.6 JP: No. It's just the fact that you have a name for the nicely arranged balls when they are in triangle, and you don't have a name for their state a minute later. Just a matter of a name, which is subjective.
1:01:41.8 SC: No, I think it's actually... I think there is something objective about the measure on the space of ways to be in the triangle versus scattered around them.
1:01:50.4 JP: Just because it's simple.
1:01:52.4 SC: Yeah. [chuckle]
1:01:53.0 JP: But if I remember my thermodynamic correctly, it is not a progression form in ordered state to disorder, but the natural escape from a narrow region in face space to a wider region.
1:02:13.2 SC: Exactly yes. That's right.
1:02:14.8 JP: Okay. So, the order is a subjective perception. We have a name to the triangle. We don't have a name for the state, to this state of the balls a minute later. We don't have it. It's just... We have to keep on saying ball number one is it has momentum on 23 and so on. It doesn't have a single name. So if we are biased by our language.
1:02:42.1 SC: I don't think it is just our language. I think that there is something about how we observe the system that lets us course grain in some ways and not others. When we say, "All the cream is separated from all the coffee in a cup of coffee, is low entropy versus when they're all mixed together it's high entropy," that's 'cause we can see the difference pretty immediately.
1:03:03.1 JP: [chuckle] Am I saying we are biased, immediately? Yes.
1:03:09.0 SC: Okay, anyway, this is my responsibility to work further on this, I just wondered if you had any strong opinions about it from the start. But I wanna get back to this idea that human beings have this baked in manifest image, 'cause this is really where it becomes important for the AI, right? I mean, the robots don't have any pre-existing image of the world. I just had Gary Marcus on the podcast, he was the one who connected us here, and Gary has been very strongly arguing that there is a road block to deep learning being too successful if it just does correlations between different things in the world, you need some structure, you need some common sense. I'm betting that you agree, and causality is gonna be part of that common sense, but how do we do it, how do we actually teach the robot what the... How the arrows go between all the parts of our little network?
1:04:00.8 JP: Okay, let me first answer, I agree, not only I agree, but I have to supplement it by saying that it is not an opinion, it's a theorem.
1:04:13.5 SC: Okay.
1:04:14.1 JP: It's a theorem that here is a limit. Now, certain task you cannot do if you don't have this set of assumptions, okay? So it is a mathematical constraints on ability of robot to do certain task. Okay, now we go to, "How do we get this model of the world into the robot?" we just, if you have the time, you can just feed it with the diagram, plus equip it with the techniques to enrich the diagram, enrich the diagram by thinking conjecturally, what experiments do I have to conduct in the future in order to answer a certain question, what additional variables, I wish I could purchase so that I can observe them enrich a diagram. So this is what we mean by automatic scientist, a scientist that can design the next experiment, because it answer the question which currently cannot be answered on the basis of the existing diagram.
1:05:31.2 JP: So the diagram must be vulnerable to, number one refutation, from the data, and number two, to enrichment. So that is a blue sky idea of automated scientist, which I could elaborate on, but it's all built on the force of curiosity, we must... We strive to obtain a state of deep understanding, and deep understanding is having the ability to answer questions in all three level of the hierarchy that gives us a sense of being in control and having an understanding of a domain of crossing the street, of rain games and so on.
1:06:23.2 SC: And I have not... I don't have any fish in this pan or whatever the metaphor is here, dogs in this fight, but I'm betting that the reaction to that philosophy from a lot of people who are working in contemporary deep learning is, "No, that's not what we do. We just let the computer learn everything it can, do all the thought experiments it can do, collect all the data it can, and the computer will figure out what the patterns are."
1:06:51.7 JP: No, for that we have a theorem, it's impossible.
1:06:56.8 SC: Is that someone's name attached to that theorem or whose theorem is that?
1:07:01.7 JP: That's the causal hierarchy.
1:07:03.4 SC: Okay.
1:07:03.7 JP: It's a ladder of causation, you cannot go from level I to level I plus one unless you have information or assumptions of level I plus one or higher.
1:07:17.4 SC: Okay, good. But I do get the impression, and again, correct me if I'm wrong, that this perspective is sort of a plucky minority in the field, it has not swept the consensus view.
1:07:31.1 JP: It is not swept the machine learning people or the deep learning people. It's the same way that it took 20 years to sweep the Statistical Society until they accepted the do-operator.
1:07:51.9 SC: But okay.
1:07:52.4 JP: Even that it's still resisted.
1:07:54.6 SC: Yeah.
1:07:54.9 JP: I still have a island of resistance there.
1:07:58.8 SC: You'll win. I think you'll win that one, I can safely predict that.
1:08:01.5 JP: I know that I'm gonna win.
[laughter]
1:08:05.1 SC: But okay, I mean, even if we're on board, even if we're on the train, it does sound hard.
1:08:09.0 JP: It's certainly unfair for me to fight against this huge industry called machine learning, okay, because I know that I have mathematics on my side, and they don't have this certainty, so I think it's unfair, I'm gonna win.
1:08:25.9 SC: You're gonna win. That's okay, it's okay to know you're gonna win ahead of time, there are worst positions to be in. But even if you're going to win, it still leaves us with hard problems about what to tell the computer, what is the diagram, what are the causal relationships that matter or even, what's the stuff out there in the universe? Do you tell it that there are tables, and chairs, and people, and cars rather than letting it learn that, how advanced is that program of sort of formalizing our common sense into intuition about the causal structure of the world?
1:09:00.2 JP: I have neglected to talk about object property relationship. I've been very narrowed in what I'm doing, I'm narrow-minded. So I only dealt with cause, effect relationship, I have neglected many other things which come into bear in natural language, envision, interpretation and so on, okay? But I think what we have learned in the causal corner can be a role model for other areas.
1:09:34.0 SC: Okay, yeah.
1:09:34.6 JP: We have to go from... We are now working with propositional calculus, we have to go to expand into predicate calculus, all kind of thing that needs to be done. And it will be done eventually. Yeah. We have to teach the robots about objects and properties relationship, chairs and tables and their functions. That is, I haven't done anything in this area.
1:10:00.5 SC: But I wanna...
1:10:01.7 JP: Except [1:10:01.7] ____.
1:10:02.4 SC: When you say that, just to be clear, because I think that some people might hear you say we need to teach the robot about object property relations, and they'll say, "Well, sure." But the important part of that phrase is we need to teach as opposed to let it figure it out.
[laughter]
1:10:18.8 SC: We can't wait for it to figure it out.
1:10:20.6 JP: Okay. Some things we need to teach others, we can let the robot figure it out. At least in my corner, I have theorem, which tells us what you can, what you must teach and what you can let the robot figure out by itself.
1:10:36.5 SC: Right.
1:10:37.2 JP: I don't have those theorems applied to natural language and vision.
1:10:42.2 SC: Okay. And then I guess the final thing I just wanted to touch on, which you've already brought up, but is very exciting but also confusing to me, is the whole set of applications to the social sciences and even to law or moral philosophy. Right? When we talk about right and wrong, blame and responsibility, punishment and reward, we're always assuming some causal structure, right? You are responsible for this happening. Do you expect that a more sophisticated nuanced idea of how cause and effect structures work is going to have an effect downstream on how we think about these puzzles?
1:11:23.3 JP: Yes. I have some expectations, some excitements. I can see how we can build the social intelligence on top of environment intelligence. So far, I was talking about a robot learning about managing a domain or understanding a domain, disease domain, and so on. But now we can build on top of that, the idea that robots can have a model of another robot or of itself if it has a blueprint of its own software, then it can reason about what made me do what I did. And then you can program compassion of that and saying, "I understand why you did what you did, because you are like me."
[laughter]
1:12:20.0 JP: If I were in your situation, I would do the same thing. But are you aware of this and this? So, all this relationship, awareness, compassion, I understand you. Trust me all this relationship involves a robot having a model of another robot. And once we have it, you know, we are gonna have a nice conversation...
[laughter]
1:12:43.1 JP: With our apprentice robot.
1:12:50.1 SC: Are you optimistic that artificial intelligence will reach human levels of intelligence at some point?
1:12:55.3 JP: I'm absolutely sure. How can one be absolutely sure on the basis of conjecture? Only on the basis that I don't see any impediment.
1:13:06.8 SC: Right. But is it a sooner rather than later kind of question? Is this something we need to kind of contemplate?
1:13:12.4 JP: I refuse to answer.
1:13:13.6 SC: Okay. Fine.
1:13:15.4 JP: No, I don't have the imagination.
1:13:19.1 SC: Yeah. Okay.
1:13:19.5 JP: That other people have. Okay.
1:13:21.2 SC: I'm with you.
1:13:21.7 JP: That [1:13:22.2] ____.
[laughter]
1:13:24.6 JP: I don't have it.
1:13:25.8 SC: Well, so you say that, but what you really mean is you have too much imagination because you can imagine many different possible things and it's hard to tell which is gonna be true, right?
1:13:34.3 JP: Correct. Yeah. Correct. Yeah.
1:13:36.9 SC: Yeah. And then, okay, I'll just close with sort of a statement that you can reflect on if you want. But as we were having this conversation, I realized the following weird thing that I write books, I write books for sort of broad audiences on physics and other things. And over and over again, in all of my books, I always start with the following idea. Whether or not I like it or not, it just it helps to start with the idea that there was a Aristotle who imputed nature's and goals to objects in the world, right? Fire wants to rise up, rocks wants to fall down. It was a teleological view and causes and effects were front and center. He had this taxonomy of cause effect relationships. And I say we got rid of all that.
[laughter]
1:14:27.1 SC: Galileo and Newton came along and we replaced this, this happens because of that language with a language of patterns, right? The equal sign in your mathematical representation. And that's been very helpful. All of modern physics is this sort of, it doesn't have any direction of time, there's no direction of causality. It's just, this is happening and this is happening and this is happening. But of course, like you say, in the cafeteria, all we physicists talk about causes and effects all the time. And so it is crucially important to me to recover, to understand how to recover our ordinary everyday understanding of causality and goals in teleology in a way that is compatible with that underlying view of fundamental physics. And so I guess all I'm saying is I'm glad to see you're doing it.
[laughter]
1:15:15.4 JP: Yeah. I'm not sure I'm capable of undertaking this major monumental goal that you mentioned that.
[laughter]
1:15:25.0 JP: Okay. But I'm having fun just capturing the way we think, that you and I think, and I have a tremendous satisfaction from seeing myself replicated, amplified on a computer.
[laughter]
1:15:45.5 JP: And I get a better understanding of myself.
1:15:50.1 SC: Well, you know...
1:15:50.7 JP: Why I had this intuition? Oh, because so and so, I have a playground for myself.
1:15:57.7 SC: I mean, maybe this is a paradigm. Maybe we should all have gigantic huge aspirational goals and work to make progress on them in little tiny pieces, step by step.
[laughter]
1:16:08.6 JP: I agree with you.
1:16:10.3 SC: Yes. Alright. That like a wonderful little lesson for us all and a good place to stop. So, Judea Pearl, thanks so much for being on the Mindscape podcast.
1:16:19.2 JP: Thank you, Sean. It's great having me, having you, having me on your show.
[laughter]
1:16:25.8 SC: Alright.
Illuminating !! Fantastic area of investigation. Dr. Lauren Ross @ProfLaurenRoss works in this field. She is on Twitter.
Thank you! Huge fan of Pearl, cannot wait to listen carefully.
Great conversation.
Good to see the podcast that evolve and picks new great guest following ideas and people mentioned in past episodes.
Thanks for the great job Sean !!!
This guy is a real mind mechanic, showing that there is no need to invent a lot of mumbo-jumbo garbage to understand how the mind works. Besides that he is so full of life, curiosity, sense of humor and intelligence.
I would love to have Dr Graziano on the show as well, the man that finally explained how our consciousness works (Dr Pearl touched on this talking about a robot with a model of itself)
I read “The Book of Why in 2018. Brilliant, Illuminating, and with exercises to get the rudimentary conceptions in the diagrammatic form. Immensely useful in computers. I lament what Bayesean theory limits. Causal language in context, in a familiar, parochial setting, carries with t meaning, import, and situational context that conveys a whole raft of culture and thought. Reducing cause and effect to a few arrow is horribly reducing. When I look at Youtube’s selections, headlines and disinformation, Bayesean logic predicting the next texted or spoken word, I lament the loss of all that is left out.
Pearl is a genius, and its applied to computation a revelation.
Pingback: Sean Carroll's Mindscape Podcast: Judea Pearl on Cause and Effect - 3 Quarks Daily
Will it someday be possible to create a robot/computer that is intelligent and aware of its surroundings and itself, in the same or similar way that a human is? And if it is, should we attempt to do so? A lot of “what if” questions, but something to think about seriously as exponential advances in computer science are taking places.
The article posted below “Descartes’ robot daughter and the zombie problem”, By: Sally Adee| May 30, 2018, captures, in a somewhat light-hearted fashion, the essence of this conundrum.
https://www.lastwordonnothing.com/2018/05/30/what-descartes-robot-daughter-can-tell-us-about-the-zombie-problem/
Pingback: Things I’ve Enjoyed #71 – Long vol, short prediction models
Sean, do you think, he might have got you with the billiard table entropy example? I suspect, he’s right. Every arrangement of billiard balls is as probable as any other. A triangular arrangement only seems unlikely because it has some symmetry that we readily perceive. But entropy (at least in its simplest form) does not say anything about the occurence of symmetries in the microstates. Usually, microstates are lumped together because they have the same energy or volume etc. but not, because they have the same symmetry. No? So, maybe Judea Pearl is right and you can not strictly derive that cause comes before effect from entropy. Maybe it’s just that what we call cause and effect is the way our brain structures what happens in time (and not much more).