Mathematics is often thought of as the pinnacle of crisp precision: the square of the hypotenuse of a right triangle isn’t “roughly” the sum of the squares of the other two sides, it’s exactly that. But we live in a world of messy imprecision, and increasingly we need sophisticated techniques to quantify and deal with approximate statistical relations rather than perfect ones. Modern mathematicians have noticed, and are taking up the challenge. Tai-Danae Bradley is a mathematician who employs very high-level ideas — category theory, topology, quantum probability theory — to analyze real-world phenomena like the structure of natural-language speech. We explore a number of cool ideas and what kinds of places they are leading us to.
Support Mindscape on Patreon.
Tai-Danae Bradley received her Ph.D. in mathematics from the CUNY Graduate Center. She is currently a research mathematician at Alphabet, visiting research professor of mathematics at The Master’s University, and executive director of the Math3ma Institute. She hosts an explanatory mathematics blog, Math3ma. She is the co-author of the graduate-level textbook Topology: A Categorical Approach.
- Web site
- Google Scholar publications
- Video introduction to “At the Interface of Algebra and Statistics” Ph.D. thesis
[accordion clicktoclose=”true”][accordion-item tag=”p” state=closed title=”Click to Show Episode Transcript”]Click above to close.
0:00:00.1 Sean Carroll: Hello, everyone, and welcome to the Mindscape Podcast. I’m your host, Sean Carroll. A lot of people, I think, when they think about math, when they hear the words math and doing math, they think of what a mathematician would call calculating, right? Performing some calculation, whether it’s something simple, like calculating the tip on a restaurant bill, something maybe more complicated, like calculating the orbits of planets and stars and galaxies and so forth, but that’s actually very, very little of what professional mathematicians do. We use math in that way, but professional mathematicians are engineers of concepts. They invent new kinds of concepts, they put them together to prove theorems, and they draw connections between concepts that you might not have known about before.
0:00:43.2 SC: So today’s guest, Tai-Danae Bradley, works at a very interesting intersection of algebra and information. Algebra being, in the mathematician sense, a kind of study of structures in mathematical objects, so ways you can multiply things together, ways you can combine things, and also ways you can take things apart that relate to each other and then build them back into the original whole. And then information in the sense of what relates to what, what do you know about one thing by learning about another thing. So here’s an example that you might not think of as originally mathematics, but that is absolutely closely related to this stew of algebra and information: Language. The language you’re hearing right now, the set of words that you’re listening to has a structure in it, right? Because when certain words pop up, other words are more likely to pop up right next to them.
0:01:39.1 SC: And of course, you can take a completely mindless approach to understanding this. You can just dump it all in a computer, right? Dump a huge text in a computer and ask, what are the statistics of this? What words appear next to what other words? But you can also think like a mathematician. You can say, well, why is it like that? What are the structures underlying this kind of thing? And it’s kind of an interesting mathematical move to introduce the idea of statistics and probability into this kind of study, because certain words in language appear next to each other more frequently, but it’s not absolute, right? It is sort of a relative connection between these things, and that’s not natural always for mathematicians to study.
0:02:20.7 SC: So Tai-Danae really works at this fascinating confluence of algebra and information statistics, and that has led her to think about one of my favorite topics, which is entropy. So we talk in this podcast about… Well, let me tell you ahead of time, it’s going to sound simpler than it is. She really is really very good at explaining things in a way that kind of don’t scare you in terms of mathematical formalism and jargon and all that stuff, but you can read the papers behind it. They’re all real math papers that involve all of the formalism and jargon that you want, but we’re going to be talking about category and topology and probability theory, and a little bit of emergence and entropy and other things that I’m very interested in. For those of you who like this kind of thing, I’ll also encourage you to check out… Tai-Danae has a website called Math3ma, except instead of the E in ema, it’s a 3. So that’s M-A-T-H-3-M-A dot com, where she does a series of very nice expository blog posts, videos, things like that. Really, a wonderful ability to make these very high-powered mathematical concepts seem a little bit less intimidating than they should be. So with that, let’s go.
[music]
0:03:51.8 SC: Tai-Danae Bradley, welcome to the Mindscape Podcast.
0:03:54.6 Tai-Danae Bradley: Thank you, Sean, for having me. It’s great to be here.
0:03:56.8 SC: So we’ve agreed this is going to be a challenge. We’re bringing you on here ’cause you have a certain genius for explaining these difficult things, but that’s okay. And it’s actually very good for me, because I do a lot of work in trying to explain difficult concept in physics to people, and it’s always good to be humbled a little bit. When I’m reading your paper, I’m like, “This is what it’s like for other people to read my papers,” and I don’t know what any of the words mean. So let’s just start with some sort of setting the stage kind of stuff.
0:04:27.8 TB: Yeah.
0:04:28.0 SC: A big part of your work involves algebra. Algebra and statistics was right there in the title of your thesis. So what do you mean when you use the word algebra? It’s probably not like solving quadratic equations.
0:04:41.3 TB: That’s right, that’s right. So when I say the word algebra, I mean it in a less technical, potentially scary sense that folks might think.
0:04:53.1 SC: Okay.
0:04:53.9 TB: So in the title of my thesis, At the Interface of Algebra and Statistics, I really just meant algebra in the basic sense of sticking things together. So you mentioned quadratic equation, quadratic, that makes us think of X squared or something. I have two numbers and I… One number I multiply it by itself, or I have two things and I can multiply them and get a third thing that’s of the same flavor or the same type as the two that I started with. Maybe there’s… Maybe they’re numbers, but as one progresses in math, you learn that you can multiply things that aren’t numbers, and you can make sense of that.
0:05:31.9 SC: That’s where it gets hairy. Yes, I know.
[chuckle]
0:05:34.7 TB: Yeah, so I just meant… When I say algebra there, I just mean in the sense that I have things and I want to have some notion of multiplication or concatenation or some type of way that I can combine these things to get something larger. So very basic, not really technical quite yet. That’s what I mean.
0:05:55.0 SC: Great, so we’re composing things together, but just to put a little meat on those bones here, what are some of these things? So if I multiply two numbers together, that’s an algebraic kind of thing, or two variables. What are other kinds of things that I can combine together?
0:06:09.1 TB: Yeah, so in the context of my thesis, the things that I’m really having in the back of my mind are words or expressions in a language, in a natural language, actually.
0:06:21.6 SC: Okay. [chuckle]
0:06:23.0 TB: So think for a second about something like… I guess the example I often give is, red is a word…
0:06:32.3 SC: Yup, there you go.
0:06:33.5 TB: And fire truck is a word, and now I want to combine them so that I get a new expression in the English language. So I’m going to define a multiplication, but let’s call it stick those things together, red, fire truck, stick them together, you get red fire truck.
0:06:52.9 SC: Right.
0:06:53.3 TB: Or if I want to be more precise, I just really mean concatenation.
0:06:56.4 SC: Okay.
0:06:57.3 TB: So, that’s one example that maybe you might not think of, it doesn’t feel like math, we learned [chuckle] about red fire trucks when we were, I don’t know, two years old looking through picture books, but I’m sort of secretly or not so secretly suggesting that there’s algebraic structure even in something like language.
0:07:16.5 SC: Okay, but this is good, because when I take red and fire truck and concatenate them, I get a phrase rather than a single word. So, part of the algebraic magic is that when we do this generalization of multiplying, we need not get the same kind of thing that we started with.
0:07:35.6 TB: That’s right, that’s right. So you may… That’s exactly right. So then you kinda ask, how do you keep track of that? You sort of feel that not only do you have these things called words, but then there’s other things called expressions, which are like multiple words, and maybe you want to keep track of those, you know? Maybe you have all English expressions of length one, that’s what we call words, you know?
0:07:58.4 SC: Yup.
0:07:58.5 TB: Dog, cat, [chuckle] breakfast. Oh, that’s a compound word. Okay, anyway, but then you have the set of all English words of length two, red fire truck, hot dog, tasty breakfast, blah, blah, blah, and then you can look at those of length three, and then maybe you might want to say, how does the algebraic structure pair well with the fact that I have different lengths involved? But there are actually tools in algebra that deal with that, but maybe I don’t want to throw in technical words so quickly at the beginning, but I’ll just say…
0:08:31.4 SC: Yeah. Go ahead.
0:08:32.8 TB: Yeah, this already gives a mathematician… You know, you’re getting excited already. [chuckle] Yes, there’s lots of things that we can think about, and math has lots of answers, so.
[chuckle]
0:08:40.8 SC: And just to be clear, this example of concatenating words, this is not just a metaphor, you’re saying this really is something we can think of as an algebra.
0:08:50.8 TB: Yes, that’s right.
0:08:52.6 SC: And… Go ahead.
0:08:54.5 TB: Yeah, I’ll say, in case there are mathematicians listening and we say an algebra, there is a very, very specific strict definition in the sense of abstract algebra. One can make sense of that, true. But to be more precise, I might mean something like… Am I allowed to use technical words? I might say something like a monoid, really.
0:09:18.6 SC: A monoid?
0:09:19.0 TB: That’s just like a gentler, friendlier atmosphere in which you can talk about multiplying things together. Maybe an algebra in the technical sense, and math kinda comes with some extra stuff, but the basic mathematical framework where you have a bunch of things, like a set or a bag of marbles or a bag of words, and you want to have a notion of multiplying them together in some sense, something like that, we call that a monoid, so that’s kind of what I mean.
0:09:47.8 SC: Yeah, I think that going forward, we’re allowed to talk about the technical terms because we want to be able to give the listeners some takeaways, like, [chuckle] if they ever see this somewhere else.
0:09:56.3 TB: Great.
0:09:57.0 SC: We were talking about infinity groupoids with Emily Riehl before, so you know, we shouldn’t feel bad about that.
0:10:01.3 TB: Ah, so I can say monoid.
[chuckle]
0:10:04.5 SC: No problem. I mean, it is in the first few minutes, but okay, it’s all going to be downhill from here. [chuckle] So, does this way of thinking about language let us take into account the fact that, for example, certain concatenations of words make sentences and certain other ones don’t, can we put the rules of grammar into our monoid?
0:10:25.9 TB: Right, right. So you’d like something else. That’s exactly right, there’s this observation that algebraic structure is there, but it’s not really everything, it’s not really all that there is to language. Because you make this observation, not everything goes together, you can’t multiply everything and get something that’s legal or something that makes sense. So, now some folks might think, well, maybe the additional structure you need is some type of grammar or syntax or something, but in my thesis, there’s a perspective that’s taken, that is… Actually, there are statistics in language, because red fire truck is something that maybe appears quite frequently in the English language, and it might appear more frequently than… The other example I like to give is red idea. Did I wake up this morning and have a red idea? Like, what does that even mean? That’s not a thing that people say. So, there’s actually an observation you can make, or not an observation, but maybe a hypothesis or something you might want to stand firm on, and that is that statistics actually can maybe serve as a proxy for grammar.
0:11:42.0 SC: I see.
0:11:42.5 TB: So that is the extra bit of mathematical structure that’s missing from language, so there’s algebra, but then what else? Like, how do you know what really goes together and what’s valid in your language? Yeah, you need to know that in English, at least, adjectives precede nouns, red fire truck, but maybe in a different language, it’s reversed or something like that. But how do you know? Well, you can kind of look at the landscape, look at what’s out there, what has every human said from the dawn of time in English? And you can just count the number of times that red fire truck has appeared, and you can see that that number is bigger than the number of times red ideas appeared. And maybe the totality of that algebraic structure, what goes with what, together with these statistics that you observe about it, tell you what’s the deal with your language. What are valid expressions in your language and what’s not? And one way to see that is you can look at the statistics or what has been said.
0:12:38.5 SC: And it is already an interesting kind of shift of perspective to have mathematicians think because people in the street would say, “Well, grammar is a set of rules for these words work together and these words don’t.” And what I’m getting from what you’re saying is, let’s imagine all possible sequences of words and assign to them some number about the frequency with which they appear, and implicitly, all the rules of grammar are reflected in those numbers.
0:13:08.3 TB: That’s right. That’s right.
0:13:09.3 SC: And this might not be the computationally most simple way to express the language, but maybe the same information is there.
0:13:15.6 TB: Yeah. Exactly, and you might ask why. Number one, yes, this sounds difficult because what does it mean to have probabilities attached to all expressions ever. What about the expressions that haven’t been spoken yet, for example. But actually, this idea comes from something that actually learns grammatically correct sentences very well just given this input, algebra and statistics, and that are these large language models that you see that are doing really well in the world of artificial intelligence today. In the world of machine learning, you can show your computer a bunch of examples of correct English text, let it read all of Reddit or something, and it just sees what goes with what, with what frequencies.
0:14:03.8 TB: And you don’t need to give it any type of grammatical input. You don’t need to say, “Okay, dogs are nouns, fluffy is an adjective, da da da,” you just let it see what goes with what, and you can… It learns some probability distribution on texts, and when you sample from that, you can see it generates coherent English texts. You can read a blog post online that discusses one day robots are going to overtake the world, but that’s okay because robots are really friendly, and then at the end, you realize a robot wrote it, or one of these large language models wrote it.
0:14:38.0 TB: But it’s sort of deceiving, because it sounds exactly like something a human wrote, it’s coherent, the syntax is correct, it’s using the semantics in all the right way, but it had no grammatical input, all it had was the algebraic and statistical information. So then the question from the math viewpoint is, wow, somewhere in this neural network, structure is being learned. What is that mathematical structure? If I just have algebra and statistics, what’s happening so that I can get a correct reproduction of what goes with what in my language? Something is in there, and the large language models of today are like the experiment that illustrates that. And so as a mathematician, you can look back and say, wow, what’s really going on under the hood? What’s that beautiful math that it’s learning that we’re seeing evidence of?
0:15:30.5 SC: Right. And so one of those things is… I guess the thing for me to say is, let’s continue on our exploration of the word algebra. I get the fact that you’re combining things, that makes sense. In the world of just multiplying numbers together, there’s sort of an irreversibility. If I say 2×5 is 10… No, that’s a terrible example. If I say, 2×6 is 12. I have 12, but I don’t know how I got there. I’ve lost where they are. Does the algebraic point of view assume that we can remember what we combined to get the final answer?
0:16:09.4 TB: If I could try to rephrase this question.
0:16:13.2 SC: Please.
0:16:15.6 TB: If I have some larger expression, maybe, or there’s a meaning attached to some larger expression. Maybe, we’d like to know, does the algebraic framework we’re in allow us to kind of zoom in on whatever individual pieces went in to make that composite whole. If I could rephrase the question to understand it, is that…
0:16:36.7 SC: Exactly. Now, that’s good. Yes.
0:16:38.9 TB: Yes. So I love this question because the answer is, yes. And this is really…
0:16:44.4 SC: Well, it’s not random. I did actually look at your thesis. [chuckle]
0:16:48.9 TB: The answer to this actually is really at the heart of the thesis, but it might take a little bit of…
0:16:57.5 SC: Please. We’re not in a hurry.
0:16:58.3 TB: Time to explain. Okay, hope the listeners are ready to… Okay. Let’s see, one way I could say this is… To explain it, I have to inch out of the world of algebra just a little bit…
0:17:13.8 SC: Sure.
0:17:14.2 TB: So the title of the thesis is At The Interface Of Algebra and Statistics, so that kind of gives us the feeling that we’re really mingling these two worlds. If I want to talk about something, I have a composite whole, but how do I get information about the little bits that go into making that larger thing? There’s a way to do this, and let me for a second talk about one way to do this that I think people will be familiar with from the world of probability, and that feels maybe like totally left field, but then we’ll bring it back and see how that relates to algebra.
0:17:49.8 SC: Sounds good.
0:17:53.0 TB: It’s just one word. If I say this word, I think lots of people will be familiar with it, marginal probability, so if you…
0:17:57.7 TB: Let’s assume that we’re not necessarily… We’ve heard that word before, but we can’t remember what it means.
0:18:03.1 SC: Okay, okay, so let’s say that… I don’t know. We’re doing two things. I don’t know what they are. We’re rolling a die or pulling from a deck of cards, something like that, and maybe you have a die in your left hand and you have a pack of cards in your right, and you just kind of do both. And you can ask, “What’s the probability that I roll a 3, and I pick a red… Four of hearts or something like that?” Okay, so you kind of have this joint probability distribution, that’s what it’s called. You have two things, a die and a pack of cards. You can roll or you can draw, and you have a probability for doing both at the same time, but you might want to know… Well, forget the cards. I forgot how to play cards or I lost my deck, whatever, I’m only interested in sort of the probabilities I get for rolling my die.
0:19:00.9 TB: That would be called marginal probability. So if you start with knowledge of doing both things jointly, how can you take that knowledge to just focus in on one of those two events, you know, rolling a certain number on your die. So there’s a formula for that. So if you know the probability for rolling each number and pulling each card kind of together, you can take those numbers and add them in some way. You can get what’s called marginal probabilities, which are just the probabilities for rolling a certain number on your die.
0:19:32.7 SC: So you just add up all the individual probabilities in that set and then you get the marginal probability.
0:19:37.7 TB: Exactly, exactly. And that’s the key. You start with something big, like two events, a die and a card, but you just kind of want to zoom in on one of those, like the die. So what you do, just like you said, you add over all of the possible events that you could have had with the cards, all the cards you could have drawn. So you ignore that. That operation of adding is kind of like forgetting. You add all of them up and now you have discarded that information. So in other words, if I roll a 4 with probability, I don’t know, 23%, let’s just say, that number 23% doesn’t tell me anything about what cards I could have drawn when I rolled the number of whatever I just said. Four? Is that what I said?
0:20:22.6 SC: Yeah.
0:20:22.7 TB: Okay, so that’s important. And this is… This notion of what we call marginal probability, it’s like a basic thing, I guess. You learn it, I don’t know, at some point in your intro to probability course and so there’s a formula, and that formula again says, if you start with this joint system, you can get knowledge about one of those systems, but kind of at the cost of forgetting all information, losing all access to the second one. Alright, now it turns out that if you… Well, let me say it this way. It turns out that there’s another way to compute marginal probability such that you actually remember or retain information about the deck of cards or about that other system that you say, “I don’t really care about that.” But rather than tossing it in the trash can, there’s another way to compute marginal probability that actually gives you easy access to it. Maybe you just toss it on your desk instead of throwing it in the bin or something.
0:21:26.1 SC: Okay.
0:21:27.1 TB: Okay, now, this… It’s essentially a passage from… I’ll say sets. I have a set of options for the die. One, two, three, four, five, six. I have a set of options for the cards, there’s 52 elements in that set. This is like classical probability theory. Normal probability theory works with sets. But this other way that I’m describing, which I might call marginal probability with memory, you sort of start with your sets, but you actually move into the world of linear algebra. So rather than sets, you work with vector spaces and rather than probability distributions, you work with the linear algebraic version of a probability distribution. Now, at this point, I’m kind of making things more difficult because it’s like, what’s the linear algebraic version of a probability distribution? Is that a thing? So what I really want to say, and I’m hesitant to ’cause I’m speaking to a physicist, but what I really want to say is the quantum version of a probability distribution. And you know what that is. [laughter]
0:22:37.3 SC: I do know what that is, but go ahead.
0:22:39.2 TB: But yeah, I am afraid to use the word quantum in case… I don’t want people to run away in fear too quickly. So what that really is, just to keep it simple, what’s the linear algebraic version of a probability distribution? It’s just a matrix. Let’s just think of it as a matrix of numbers. And you might want to ask that matrix to satisfy properties that feel like a probability distribution. For example, you might say, oh, I want the trace of that matrix, the sum of all entries on the diagonal to add up to 1. It kinda feels like a probability. Distribution probabilities add up to 1, so you kind of make restrictions on such a matrix. So there’s a way to do this. When you do this kind of in the correct way, there might be more than one way to think of a probability distribution as a matrix or as what I was calling the quantum version of one, or a technical word is a density operator, if folks are wondering.
0:23:38.3 TB: So there might be several ways to realize a probability distribution as one of these density operators, but if you do it in kind of what I’m thinking as the correct way or a way that is very advantageous for this discussion, then because now we’re in the analogous world of linear algebra, you can do something that is completely analogous to marginalizing. So you can actually start with a matrix that kind of gives you information about your joint events, rolling a die and drawing a card. That information is stored as a matrix. Now, to marginalize that, previously we said, yeah, kind of some overall probabilities where the cards could be whatever they wanted to be, but now I have a matrix, so what’s the analogous thing of that for matrices? So that has a name. That’s called the partial trace, if folks are wondering.
0:24:30.0 SC: Okay. [laughter]
0:24:30.2 TB: So you can compute this and you get another matrix, it’s called a reduced density matrix, but it’s very much just like the linear algebraic version of a marginal probability distribution. And what’s interesting there is that if you were to look at the diagonal of this reduced density, this newer version of a marginal distribution, you would actually find on the diagonal precisely the original marginal probability distribution, say, of your rolling your die. Okay, that’s on the diagonal of this matrix, but what you’ll also find if you do this passage in kind of an advantageous way, is that there are, generally speaking, non-zero off-diagonal entries on that reduced density, and those non-zero numbers tell you information about the deck of cards that was there.
0:25:26.4 TB: It might say something like, hey, every time you know you rolled a 1 and a 3 mysteriously, you also always drew a Jack or something like that. And that’s what that off diagonal, the first and third off diagonal of your matrix tells you, sort of tells you how many cards were in common, given these two numbers that you would roll in your die. So that’s interesting ’cause that’s the information you don’t have access to when you marginalize in the usual way.
0:25:54.9 SC: So let me see if I got it. So we certainly don’t mind the word quantum being thrown around here, it’s not the first podcast to do that. Just to be clear, we’re sort of using mathematical tricks from quantum mechanics, we’re not imagining that our deck of cards or our die are truly quantum in any way, they’re pretty darn classical, we’re just trying to use some tricks. And so the trick is that in conventional probability, we’d imagine all the events that could happen and we would assign a number to them, the probability, and so we’d imagine like it’s just a list of numbers, and now you’re saying, let’s promote that from the start to a matrix, to instead of N numbers, an N by N array of numbers, and so there’s more room for interesting things to go on because you have both the diagonal part of the matrix, which is your old-fashioned set of probabilities, and more info is lurking there on your off diagonal.
0:26:51.3 SC: And keeping this as your storehouse for probability information lets you both compute the marginal probability, but then also keep some info about what was going on in the thing you are now ignoring.
0:27:04.0 TB: Exactly, that’s exactly right.
0:27:05.9 SC: And is it all the info or just some of it?
0:27:09.1 TB: Yeah, so great question. It turns out to be all of the info.
0:27:14.9 SC: Really?
0:27:15.7 TB: In a way that can be made very precise. This kind of gets a little bit technical, though, and I have to think way back into page 87 or something. [chuckle] But there’s a way that… You can actually show that the marginals, these things on the diagonal together, both of them, by the way, you can marginalize out the cards or you can marginalize out the die, okay, so you kind of have to have both of those, together with the off diagonals.
0:27:48.5 SC: Okay.
0:27:48.9 TB: Is sufficient to recover the joint. That happens to be true, yeah.
0:27:52.2 SC: That makes a lot more sense to me, because I thought if you just marginalize one, you’re just losing information, but you’re saying, I’m keeping the matrix associated with marginalizing either one. Yeah, that makes sense. Okay, but let’s back up and be a little bit philosophical here. I think that we’ve done an admirable job of making sense so far, so let’s stop doing that. [chuckle] The words that you use when you describe some of these things, like taking pieces and then composing them, and then asking how to reconstruct the whole from the parts, these have counterparts in philosophy as well as in physics, in realms of things like mereology.
0:28:33.5 SC: Do you know, have you ever heard the word mereology before? That’s what philosophers call the relationship between wholes and parts, so how you break up a whole into parts, and I’m very interested in it because of questions of emergence of different higher level descriptions from lower level descriptions. So is… And here’s an answer I don’t… Here’s a question I really don’t know the answer to, not a leading question. Is there some sense in which these techniques you’re talking about are helping us decide either the best way to take a bunch of little things and combine them into a big thing, or given a big thing, the best way to divide it into a set of little things.
0:29:13.8 TB: That is a great question. Let’s see what… Can I say anything helpful about this? I think it would be interesting depending on how far or how deep one can explore this mathematics. I think that there’d be a very good chance that the math will have something very interesting to offer to the philosophers with those kinds of questions and vice versa. I think this is exactly the real estate for that, but I don’t have… I don’t have good quotes or something that come to mind where I can say, oh, yeah, in philosophy, there’s this idea by this person, and in fact, we have a theorem that says exactly what he said, but in formal terms.
0:30:03.5 TB: That would be great. But I do get the feeling, yes, that this kind of direction, taking sort of a non-traditional perspective on algebra and probability in this way, maybe can lead you down avenues that haven’t yet been explored, so that you can see sort of these more philosophical questions from a different viewpoint that maybe is more illuminating.
0:30:30.7 SC: Yeah, no, I think that’s exactly right. And like I said, I didn’t know the answer to the question when I asked you, so it was not a leading question. But there’s two examples in my mind that I can’t help but just say out loud to give you and the audience food for thought. One is, I just did a podcast with Anil Seth, who is a neuroscientist, but he also write papers about emergence, and so he’s literally dealing with… He and his co-author Lionel Barnett, a time series of random variables, and they want to know… When… How do you coarse grain? How do you combine different variables into a smaller number of variables so that you still get a predictive theory? What is the information you can ignore and still be Markovian, in the sense that the state now predicts the state next?
0:31:19.6 SC: So that’s one. This is an open problem, it’s very, very simple to state, and it’s just like how do you divide stuff up, and this is very interesting, for example, to people who do origin of life research. What made an individual? What made a cell back then? How did you divide it up? And the other example is, I just wrote a paper with my student, my former student Ashmeet Singh, called Quantum Mereology. So these are, a word you know and a word that is new. How do you take a quantum system and divide it up into the system you care about and its environment? So how do you take Hilbert space, this big vector space, and factorize it into tensor product factors so that one of them acts like a classical system and the other one acts like the environment? I’ve nothing wise to say other than everything you’ve been talking about sounds relevant, and I should… I try to understand it better than I do.
0:32:11.2 TB: Yes, I have to listen to that podcast and read your paper now. [chuckle] It sounds great.
0:32:16.7 SC: Good. I’ve given you homework, I like to do that, but…
0:32:20.1 TB: Good.
0:32:20.6 SC: Okay, so enough with the open-ended questions, let’s move back to what you do know something about. You bring up this thing called the Yoneda Lemma, am I pronouncing it correctly?
0:32:31.3 TB: Yes, exactly.
0:32:31.4 SC: And it’s very funny because you say like, this is… On the one hand, it sounds completely trivial, and on the other hand, it’s the most important thing in the world, and it deals very much with this question we’ve been talking about about objects and sub-groups of objects. So can you explain to the person on the street what the Yoneda Lemma actually says?
0:32:50.3 TB: Yes, yes. So the Yoneda Lemma is a very well-known theorem. Okay, it’s called a Lemma, but really it’s a theorem, in category theory, which is a modern branch of math that’s quite abstract.
0:33:07.5 SC: Right.
0:33:07.9 TB: So maybe we’ll leave it there, but if we want to talk more about that happy to say more.
0:33:12.8 SC: Do we want to… Do we want to say what category theory is first? That’s okay, I was going to ask you that at some point.
0:33:18.6 TB: Sure, sure. So category theory, there are, I think, many ways that one might say what it is, or analogies to try to explain it. I like to think of category theory as a branch of math that kind of provides a template for other branches of math. So quite often what you’ll find in mathematics is that ideas or constructions are kind of repeated over in different fields or different branches, so you might build something out of legos in your world, maybe that’s the world of topology or something, and you have your little topological building blocks and when you construct a space. I don’t know, I called it lego building blocks, but whatever, you construct something.
0:34:07.1 TB: And then you look at your neighbor, 16 blocks down in the neighborhood of group theory, and lo and behold, they’re building something that looks just like yours, but they call it one thing because they live in that cul-de-sac and they have accents over there and they have a culture and blah, blah. But you call it something else because you live in your cul-de-sac, and you call it whatever you call it, and you use your accents, blah, blah, blah. But if you were to strip away all of those sort of details, you really find that the two folks are doing the same thing.
0:34:40.5 TB: And category theory is kind of a common language that mathematicians can use to say, oh, here’s a thing. When you specialize into the world of group theory, they usually call it, usually call it such and such, when you specialize it into the world of topology, they give it this name, but actually both folks are really doing the same thing. So it’s kind of like a template where you kind of… I guess maybe template, and in another thing I like to use is, another analogy, are Mad Libs, these stories where you have a basic narrative, but some of the words are missing, like fill in… My best blank, insert a noun, here, went to the, insert another noun, on a adjective day. My friend went to the grocery store on a lovely day or whatever, but if you substitute different words, you get a different story, but the overall narrative is the same.
0:35:32.0 SC: Oh, okay, that’s very nice. Yeah.
0:35:33.5 TB: So, yeah, I like to think of category theory as like that, kind of… It’s like the Mad Libs for mathematics. You kind of have a similar narrative in many mathematical fields, but depending on the words that you choose to fill in to that narrative, maybe you get a different story, but the overall feel is the same.
0:35:51.1 SC: So just to be, I mean, maybe a little bit more concrete. I think what you mean is that there are things like multiplying or taking the inverse of that appear in very, very different mathematical contexts, and category theory gives you a common language to operate them all.
0:36:08.1 TB: That’s right. That’s right. Or I think maybe a more concrete example for folks, ’cause I was very vague, but when are two things the same? You know, if I have two sets, two bags of marbles. Are they the same? And what does that even mean? What considerations should I have when I’m talking about sets? What’s interesting about a set? Nothing, really, it’s so boring. There’s nothing going on, it’s just like, dots, blah, blah, blah. What can you say? Nothing. All you can do is count. How many things are in my set? So if two things are the same, well, then, how do I know if two things are the same? Well, essentially, if they have the same number of elements in them.
0:36:45.8 TB: If I have two bags of marbles and they both have 53 marbles, for all intents and purposes, maybe they’re the same. But what if your set has an extra structure? We’ve been talking about multiplying things, so what if I can “multiply” elements in my set, what if my set is the set of real numbers? I can certainly multiply them, so that’s additional structure that’s there. So maybe… You know, you mentioned inverses, so maybe what I really have is as a group actually, maybe my set was actually a group all along, but what does it mean for two groups to be the same?
0:37:17.9 TB: I have a way to multiply elements, each element has an inverse, maybe I have a multiplicative identity, like a half times 2 is 1, so maybe I have a thing that behaves like a 1, so what should sameness mean there? Well, category theory helps you make sense of that, and you can kind of see when you have the answer to that, what is the proper notion of sameness. It turns out that you subsume the notion of sameness in all of these fields. Maybe in group theory it’s called a group isomorphism. Maybe in the world of topology, it’s called a homeomorphism. In differential geometry that has a name. In each of these fields you have different names, but in the eyes of category theory, this notion of sameness, you just call it an isomorphism.
0:38:05.5 SC: Got it.
0:38:07.2 TB: So maybe when you’re an undergrad, you’re like, “Oh, I have all of these vocab words to memorize, blah, blah, blah, this is so confusing,” but then category theory is like, “Nah, it’s just one concept.”
0:38:17.0 SC: More category theory in high school math. That’s what we need. Okay, and so then, this is good, now we live in category theory and the Yoneda Lemma.
0:38:25.9 TB: Okay, so the Yoneda Lemma, let me just say what it means in English, not the technical sense, and what it means in English, I think can be very relatable and something that we can all maybe agree on. The Yoneda Lemma says if you want to understand an object, a mathematical object, like a group or a space, or a set, the Yoneda Lemma says that all of the information about that object is contained in the totality of relationships that object has with all other objects in its environment. Or a little more technical, a mathematical object is completely determined up to isomorphism by all of its relationships.
0:39:12.7 TB: So let me give… Let me say this in a different way. You can learn a lot about a person by seeing how they interact with other people, right? Like if you just sit outside, maybe we can sit outside in coffee shops these days and you can people watch and when you see this person, oh, they’re like talking to the guy at the flower shop, that’s cool. Maybe they are buying flowers for a friend or something, or you look on social media and see who such and such follows on Twitter, or what do you do on Friday nights? All of these things, if you observe a person and see how they interact with other people, you can glean a lot of information about that person, so how does someone relate to other folks of their kind.
0:39:51.5 TB: And so that the point is, this theorem says the same is true in mathematics, except it’s not just that you can learn a lot, but you can learn everything, everything you need to know about a “mathematical person,” I.e, an object, like a group, whatever. All you… Not all you need to know, do, but all of that information is contained in how that object relates, so what does it mean to relate? Well, this gets to words, gets to ideas, like, how do I go from one group to another? How do I go from one vector space to another, these are things like linear transformations or group homomorphisms or continuous functions. A relationship there is really like an arrow from your object to something else.
0:40:38.2 SC: Right.
0:40:39.2 TB: Where that arrow is a function that maybe preserves whatever structure that’s of interest in that category that you’re in.
0:40:46.0 SC: So when you say that, the example that comes to mind, and I’m not sure if I’m over-interpreting or not, is if I have some space, some manifold or whatever. I remember my delight when I learned that the set of points in this space is the same as the set of maps from one point into the space.
0:41:09.3 TB: Yes, yes, yes.
0:41:12.2 SC: So is that an example or is that the essence of the Yoneda Lemma? Or is there more richness there that I’m not immediately perceiving?
0:41:21.0 TB: Okay, so that is intimately related to the Yoneda Lemma. The Yoneda Lemma says… Let’s see, it says a little bit more, if you’re thinking of manifolds, but suppose we were to forget all of the structure of a manifold, like, okay, you have these charts… But let’s just think about it. You mentioned points, so let’s just think of it as points for now, and kind of forget the topology. In that case, I think a corollary of the Yoneda Lemma or one of them, it says exactly what you said, the set of all points are the same thing as functions from the one point set into your set. In that case, for sets, it’s quite, it’s like, oh, that’s underwhelming, because there’s not that much structure, and so you might want to ask for more, and so the Yoneda kinda gives you… The Yoneda Lemma kind of gives you more in that when you…
0:42:09.1 SC: I totally get it. You cleared up my confusion there, because what you’re saying is, of course, I have more than a set, I have a group, I can multiply things, or you know, some topology or whatever, and none of that is captured in this mapping points into it. But it is captured by its relationships to other things.
0:42:26.7 TB: Exactly, exactly. So that’s exactly what I want to say. Thank you. The Yoneda Lemma says it’s not enough just to look at functions from the point into your manifold. But because the point is a manifold, it’s not a very interesting one. So, look at all of the other ones.
0:42:44.0 SC: Yeah, the circles… And the whatever.
0:42:44.1 TB: Exactly, all of them. So when you consider all of these other manifolds into yours, that gives you all of the information you need to know. So it doesn’t just tell you its cardinality, but you might get other like, oh, its smoothness or something like that.
0:42:56.6 SC: Sure. Okay, but now, now it went from being trivial to being overwhelming, how in the world do I get useful work out of knowing that the set of relationships between my object and every other conceivable object tells me everything there is to know about it. I mean, how do I know of that? What do I do with this useful… With this information?
0:43:20.4 TB: Yeah. Yeah, so I like to answer this question back in the context of language.
0:43:25.3 SC: Please, sure, go ahead.
0:43:26.5 TB: Which we were thinking about earlier. So, there’s a… I mentioned we were talking about philosophers and quotes earlier, I couldn’t think of one then, but here’s a quote from a linguist, so not quite the same.
0:43:36.3 SC: Good.
0:43:37.3 TB: But I like this quote, which is very much like the Yoneda Lemma, but in the context of language. So there’s a linguist, John Firth, I think in a 1957 paper, he says, “You shall know a word by the company it keeps.”
0:43:52.0 SC: Nice. Yes.
0:43:55.2 TB: I like that. Yeah, you shall know a word by the company it keeps. So what’s the meaning of fire truck? Well, it’s kind of like all of the contexts in which the word fire truck appears in the English language. Okay, so fire trucks are like maybe red. I don’t know, sometimes they are yellow, I guess. Maybe they speed quickly. From all of the expressions in which the word fire truck appears, you can kind of glean information about that word, but the Yoneda Lemma, if we were to take the spirit of the Yoneda Lemma, borrowing from the category theory, you kind of recover John Firth’s quote, oh, everything I need to know about this word, the meaning of the word fire truck, is contained in the network of ways that word fits into the language.
0:44:38.4 SC: Right.
0:44:40.4 TB: Okay, and how is this useful? Well, it goes back to sort of this experimental motivation we were talking about earlier. It’s really interesting that large language models really only have access to the context in which a word appears, they just sort of see what goes with what in the language together with the statistics, and somehow from that can glean semantic information about words. How do you know that? Well, because they can generate coherent pieces of text that look like a human wrote them. So something in there, you’re using the context of a word, which both the Yoneda Lemma and linguists like John Firth might say that’s a pretty good proxy for the meaning of that word.
0:45:26.1 SC: Yeah.
0:45:29.9 TB: Somehow that’s being used by a machine to then learn something about the language to the extent that it can produce good texts. So I think there’s the usefulness, it’s being used actually as we speak somehow, but then the math question is, oh, what really is that, okay, there’s the Yoneda Lemma, but if I know this algebraic and statistical structure, maybe with help from category theory, one question might be, oh, can I model this, can I represent this? Maybe in the precise sense of representation theory, there’s matrices, talked about those, those are nice to work with. In a way that captures this information, you know, that’s sort of theoretically clear.
0:46:13.9 TB: Sometimes neural networks… Not sometimes, but neural networks, I think are notorious for being kind of black boxes, you know, you look inside and it’s like, oh, 180 billion parameters. What is happening? Something? What? We don’t know. But maybe if you take a little bit of inspiration from category theory, maybe from quantum physics, linear algebra, maybe these tools can kind of be combined in a way that’s clarifying sort of from an interpretability perspective, but also mathematically. So that’s one long-winded way of the usefulness I see coming down the line.
0:46:47.2 SC: Well, Let me push on the maybe that sneaked in to your sentence right there.
0:46:50.7 TB: Yeah.
0:46:52.4 SC: Because, it’s very interesting to think of these deep learning models and the successes we’ve had in language generation and things like that, and separately, it’s fascinating to think about category theory and these things you said about probabilities and quantum mathematically-inspired versions of probabilities. How… Where are we on the stage from hopeful aspiration to concrete implementation in actually using these mathematical ideas to help us with building… Not even necessarily building, understanding or doing something with artificial intelligence or deep learning?
0:47:33.9 TB: Yeah, so on the spectrum of theory to actual, hey, can we see an experiment, you know, maybe at this moment, this day in 2021, I think my collaborators and I might be… We’re inching towards the experimental side of things and actually have some stuff going, but really working on the theory for the moment.
0:47:57.3 SC: Okay.
0:47:58.8 TB: I think maybe as we progress and do a good job of explaining the ideas in a way that’s easier to understand than reading a technical paper, maybe that might even encourage progress on the experimental side even more as these ideas become accessible.
0:48:13.7 SC: Sure.
0:48:16.3 TB: But it’s certainly in progress. Absolutely.
0:48:18.3 SC: Okay, interesting. So then the other thing I want to press on is, I mean, you sneaked in what to many people would be a massive metaphysical claim here, that maybe you want to defend or maybe you’re just like, well, let’s just see how it goes, mainly that all of the semantics is in the statistics of where it’s appearing together, like when you say, okay, so a red fire truck appears a lot and a red idea doesn’t appear that much. And what you were getting at is the idea that if we knew all of the answers to the questions, how often do these combinations appear, that’s all there is to know, in some sense, about the language, and some people are going to say no, no, no, when you say the word fire truck, that means something, it means something out there in the world.
0:49:05.8 SC: So I think that the perspective that you’re sort of falling down on, and which is actually one I’m quite sympathetic to, is probably something like pragmatism in the philosophical tradition, William James and people like that, right, the meaning of a word is its use. But I do think there are other people who would say no, no, the meaning is the fire truck out there. And do you care about this distinction?
0:49:31.3 TB: Right. Right. So yeah, thank you for bringing this up before I get, you know, doused with fire or something. So yes, I am completely aware that folks may say, what? You can’t make this claim that all semantic information is contained in just the statistics. Of course, I’m aware of this. So let me… Yes. Let me give this caveat. I acknowledge that there could be even more mathematical information. You know, we started off by saying, hey, there’s algebra, so cool. Wait a second, that’s not enough. There’s also statistics. Now, I’m at this point where I’m saying, hey, algebra and statistics, that gives you… Urgh, kind of everything.
0:50:12.1 SC: Everything.
0:50:12.2 TB: A la Yoneda, but yes, maybe there could be something else that we’re not quite capturing, so I want to acknowledge that, yes, and that’s exciting. That’s great. More math to be done. However, quite an awful lot of semantic information is learned, and I am not making that up or saying I stand my ground, I’m saying, look out into the world of language modeling, there are companies being built around the success of how much information you can learn based on just algebra and statistics, so kind of for now on my mathematical journey, I’m like, eh, yeah, it’s quite a lot, let me figure out this math too.
0:50:50.5 SC: Sure.
0:50:52.4 TB: Then down the road, I’m sure there’s so much more, a lot more to do, but yeah, let me not be burned at the stake for this.
0:50:58.1 SC: Well, no, no, these are controversial questions, we don’t know the answer, that’s why I’m just sort of… It’s fun to draw these connections, but if I am… Part of my job as podcast host is to be skeptical and play the devil’s advocate. So if I’m… The worries about the language parsing application of artificial intelligence is that if you go outside the domain of the training set, a human being would understand how to extrapolate outside, whereas the computer cannot. So I try to come with a good example here, and I’m not sure I succeeded, but here’s my example: If I put an elephant on the cardboard airplane, will it fly?
0:51:48.0 SC: Because the point being that the language model might… That’s a sentence no one has ever said before, right, or a question no one has ever asked. And the language model might say, well, airplanes fly, so the answer is yes, but I said it was a cardboard airplane with an elephant on it, and the language model might not know that that might make it difficult. Do you have any… Is this a reasonable ask for AI to be able to answer questions like that?
0:52:12.1 TB: So from a hopeful viewpoint, I would like to say yes, and let me give… So I love that example, I like that a lot. So let me give you an example, which I think may show their kind of thinking, or this idea to generalize beyond just sort of this data set. And I’m going to botch this, so apologies to the author of this article, but I think last year or semi recently there was an article in the MIT Technology Review about one of these well-known large language models. And it was given a prompt, it was asked a question, I can’t remember quite exactly, so I’m super paraphrasing, but it was basically like, we’re in a room of someone’s house, maybe in the living room and in the dining room our friends are over there about to have dinner, but we have to bring in an extra table ’cause there’s lots of people.
0:53:05.5 TB: So we have a table, so how do we bring it from the living room into the dining room through a doorway, like how are we going to fit it, okay, how do we get this table to fit through the doorway, and sort of let the language model answer that question. So this is kind of analogous to your question of if I have an elephant on this cardboard plane, will it fly. So, it was so funny because this large language model… I mean, what would we do as a person? What would we do? I have a table, I want to bring it into the other room, it doesn’t fit through the door, so what would we do? Maybe just rotate it so the legs aren’t gramming through the door or something.
0:53:41.8 TB: This language model in this article, it’s like, get a hand saw and chop it in two, and then bring each piece individually through the door.
0:53:51.7 SC: It’s not wrong.
0:53:52.3 TB: And it was funny because the author of the article kind of presented this to show the case, look how silly this thing is, it really can’t think like a human, like how dumb. But actually, it’s showing you that it learned something about the Earth, it learned that if you have a table and you can cut it in half, it’s smaller, and then maybe then it can fit through the door. So I think that… That’s kind of thinking outside of the box. I wouldn’t have said that if someone asked me, so maybe there is a lot in there in these kind of billions of parameters, some things being learned, but it’s still kind of early stages, but examples like that make me think, oh, maybe it can quite generalize in ways that we were suspicious of.
0:54:38.3 SC: No, I think that’s perfectly fair. I am sympathetic both to the worry that there’s a certain lack of manifest image, commonsensical view of the world, that it’s harder to train the AI, and therefore it will get into trouble when we ask it to leave its domain, but I’m also sympathetic to the idea that in some sense, there’s no spooky essences in the world, there’s only all the relationships between things. So in principle, if we could teach the computer all of those things, it would know everything there is to know. I’m not sure. Okay.
0:55:11.8 SC: But I think that we have failed to be sufficiently inscrutable here. I think that everything we’ve said is too easy to understand, so I want to move on to your more recent work on entropy, ’cause I like the word entropy, this is the mistake I made to the audience, I read that you had a paper and the word entropy was in the title, and I thought that I’d be able to understand it and that didn’t happen, but that’s why I have you on the podcast so you can help me out here. Tell us how you as a mathematician think about the word entropy, and then we can put it to work and talk about operads and simplices.
0:55:48.0 TB: Yes, yes. So when I think of the word entropy, I really have in mind for that paper Shannon entropy, and so I think there’s a very easy way to think of it, and it’s kind of the elementary way that I think of it. So if you have a probability distribution, let’s just say on a finite set, earlier we talked about rolling a die, for example, but whatever, you have some little list of probabilities. And I think of the entropy of that list of probabilities or that distribution as sort of the amount of surprise that’s contained in it, or the amount of uncertainty.
0:56:31.5 TB: So I think the easiest way to think of it as the following. So if you have an event that you know for certain is going to happen, I really love coffee, and with probability 1, I will have a cup every morning when I wake up. So if you know that about me, if I tell you, hey, I had coffee this morning, you are absolutely not surprised.
0:56:58.3 SC: Didn’t learn anything.
0:56:58.4 TB: Yeah, didn’t learn any information, so in that it turns out when you look at the definition of entropy, the actual formula, you get a number, and that number is zero. So it’s like, how surprised were you? Zero surprised. So entropy is a number associated to a probability distribution, and it’s zero or a positive, and sort of as you increase in that positivity, it tells you how surprised you should feel or how much information you gained based on these… The distribution on these events. So I think that’s kind of a friendly way to think of it.
0:57:34.0 SC: Yeah. So just to be clear, so we’re back in the world of probability distributions.
0:57:38.2 TB: Yes.
0:57:38.4 SC: And probability distributions are sets of numbers that are all non-negative and they add to 1, right? That’s what a probability distribution is. And so for each probability distribution, there is an entropy that you can calculate and it’s basically… Is it safe to say, do you think of it as how spread out the probability distribution is, if it’s all peaked the entropy is low, and if it’s all spread out, the entropy is high.
0:58:00.4 TB: Yeah, yeah, exactly. So if you have a distribution where basically all of the events are the same probability, like if you have any events, let’s say Y N 5.
0:58:10.4 SC: Yeah. [chuckle]
0:58:10.5 TB: I have five things and they all have probability one-fifth. Then in that case, they’re all spread out evenly and the entropy is as high as it can be.
0:58:19.2 SC: Good.
0:58:20.5 TB: And that’s kind of… You have equal amount of surprise like, oh, it could have been any one of the five, and it was this one. Wow. But yes, if it’s peaked or concentrated at 1, in that sense it’s… The entropy is zero, no surprise. I had coffee this morning, big deal.
0:58:34.2 SC: Right. And so in your paper, you link this information theoretic notion of entropy to issues in algebra and topology, which is all very cool, but maybe… So I want to get into that, but is there a big picture point that you’re trying to get at, what is the goal here? We’ve been studying entropy ever since Boltzmann and Gibbs back in the 1800s. What is it that we’re trying to understand by relating it to these issues in algebra and topology?
0:59:03.3 TB: Yeah. So I guess there are a couple of ways I could answer this from a math perspective. Let me see. The first one that comes to mind. So entropy assigns a number to a probability distribution, as you said. So you can think of it as a function.
0:59:21.3 SC: Yeah.
0:59:21.4 TB: If I think of the set of all probability distributions on five elements, entropy assigns to each one of them a number, which is like the information content of each distribution. So you might want to characterize that function like, oh, I might have other functions that go from probability distributions to numbers. Do they all convey information content? Do they all have the flavor of entropy? So a mathematician might want to look for a characterization of entropy. If I have a function that spits out a number on a probability distribution, how do I know that I can interpret that number as some type of amount of information conveyed or something that behaves like Shannon entropy? Are there properties that entropy satisfies? Can I list them? Is that list efficient to tell me, if I walk down the street and, oh, I ran into another function. Wait, let me pull out my checklist.
1:00:12.0 TB: Let’s see. You satisfy these properties, hey, you’re also entropy. Right? You might want to know about that and that’s a nice organizational tool. So one thing that my paper does gives one of these characterizations of entropy, but it does it by using tools from algebra and topology. And then you might say, well, why would you bring in that hefty duty stuff? Like, we were happy over here with just lists of numbers, why in the world go to algebra and topology? I think that’s a very exciting result when you can connect fields that do not feel related on the surface. For me, that’s very exciting and might suggest like, hey, maybe there’s something deeper going on. You know, we thought about entropy in this one way, but if we can look at it from a topological perspective or an algebraic perspective, maybe that can give us new insights or new intuitions.
1:01:08.9 TB: So that was another motivation for pursuing a project like this, a little… A project like this, because it felt like it connected things that shouldn’t really feel connected, but there it is, the math… The math is there and that’s interesting, I think.
1:01:22.7 SC: No, that’s exactly what got me. I’m like, what is… I think I thought I knew entropy, why is it being connected all these things? And so one of the ideas, I think this is worth trying to explain, which is the idea of a simplex. Because I know what a simplex is. I took topology courses when I was in graduate school, I learned about homology and cohomology and things like that. And you just say a probability distribution is a simplex, and I never thought about that [laughter] Why don’t you explain what is going on there, if that’s possible?
1:01:55.1 TB: Sure, sure. So let me give a definition, and actually, so I’m going to make this really easy for everyone, let’s just make this a definition. So let’s say that an N simplex… Okay, so pick a number N, natural number, 1, 2, 3, blah, blah. An N simplex is the set of all probability distributions on N elements. So in other words, let’s just think when N is 1, I have one element and I need a number associated to that element that adds up to 1, and is non-negative. Okay, well, I only have one option. Okay, 1.
1:02:35.0 SC: Better be 1, right?
1:02:35.8 TB: So that’s boring. That’s my coffee scenario. I woke up this morning, I had coffee, probability 1. Okay, so let’s think of N as 2 now. If N is 2 a… Oh, I think there’s some… Okay, so this is kind of annoying. I made a little bit of an indexing mistake.
1:02:55.3 SC: Yeah.
1:02:55.5 TB: I think technically, if we were to look in a textbook or Wikipedia, an N simplex is a distribution on N plus one points.
1:03:05.5 SC: I think that’s exactly right and it’s exactly that technicality that was confusing you. But do you…
1:03:09.3 TB: Yeah, and…
1:03:09.8 SC: Okay. That’s okay.
1:03:11.1 TB: That’s annoying. Okay.
1:03:12.5 SC: They’ll roll with us, the audience is with us this far.
1:03:15.4 TB: Okay, so a zero simplex is a distribution on one element, and that’s my coffee in the morning, 1 point.
1:03:21.6 SC: That’s fine.
1:03:21.7 TB: Okay, fine. A 1 simplex is the totality of all probability distributions on two points. So those are the set of all pairs of numbers, one half, one half or whatever, that are non-negative and when you add them together, you get 1. So all pairs of real numbers satisfying that, together all of those pairs are called a 1-simplex.
1:03:50.2 SC: And the reason, by the way, that you made this indexing mistake, which is the most natural thing in the world, is because on a probability distribution, we know the numbers have to add to 1, so if there are N numbers, you only need to tell me N minus 1 of them, and I know what the other one is, right? That’s what’s going on. So if you have two variables, as soon as you give me the probability of one of them, I know the probability on the other one right away, so I don’t even need to calculate that or whatever.
1:04:15.8 TB: That’s right, that’s right. That’s exactly right.
1:04:16.9 SC: Don’t need to specify it.
1:04:19.9 TB: Yeah, and that’s kind of, if I’m thinking of a 1-simplex, it’s usually drawn as a line.
1:04:25.9 SC: Yeah. It’s a one-dimensional thing.
1:04:27.7 TB: It’s a one-dimensional thing, and that’s because if you kind of chop your line into two bits and you know the length of the first one, and you know both of them must add up to 1, you know the length of the second one. So that’s kind of it. Okay, so thank you, we have this indexing thing going on, but this is exactly why I still am [1:04:43.8] ____ this.
1:04:43.9 SC: And the 1-simplex then is just the list of numbers from 0 to 1.
1:04:49.1 TB: The 1-simplex which is…
1:04:52.6 SC: Because it can’t be less than 0. No negative numbers are allowed in a probability, and it can’t be more than 1.
1:04:58.2 TB: Yeah, yeah, exactly. You can think of it as the unit interval which is a line. That’s right. Okay, so in general, an N-simplex is the set of all probability distributions on N plus 1 elements. And so you can think of it actually, when you take this geometric perspective further, it turns out that a 2-simplex looks like a triangle. A 3-simplex looks like a tetrahedron, a 4-simplex is hard to draw.
1:05:27.5 SC: And it looks like a triangle because you might think you need to give me two numbers and it might be a square, but if the two numbers are too big, then they’re going to add to more than 1.
1:05:34.4 TB: Right. Exactly.
1:05:36.3 SC: So half the square is cut off.
1:05:37.7 TB: Exactly, so half the square is come off. Exactly right, yep. So when we think of simplices, you can think of this definition I gave or you can have the pictures in mind, Oh, 0-simplex is a point, 1-simplex is a line, 2-simplex is a triangle, and then higher and higher you go. This is actually where the topology comes in.
1:05:57.8 SC: Exactly, right.
1:06:01.1 TB: Because I have a shape, a triangle, I can think of that as a subset of the X-Y plane, R2. R2 is a topological space, I have a sub-space of that, there’s my topology. So actually, if entropy has something to do with probabilities and probabilities are really simplices, then their connection to topology is maybe not too surprising, it’s been right there all along.
[chuckle]
1:06:28.7 SC: Only a mathematician would say that, but okay. [chuckle] But okay, so let me get it right. When we are visualizing this line or the triangle or whatever, that is really the set of all probability distributions.
1:06:45.7 TB: That’s right.
1:06:47.7 SC: Each point inside is a probability distribution.
1:06:49.6 TB: Exactly, exactly.
1:06:51.9 SC: So entropy is a function on the simplex.
1:06:54.6 TB: Yes, it’s a function on the simplex. And in fact, you can think, fix a simplex, fix a number N, maybe you have a triangle or you have a tetrahedron or you have a line, pick one of them. Don’t think of all of them, just pick one of those shapes. Then you can think of entropy as a function from that one shape into the real numbers. So you fix the length of your list of probabilities, all of those of length 13 or something, that’s a 12-simplex, 14-simplex… Oh, okay. Anyway, so you think of these numbers and then entropy is a function on that simplex valued in the reals. So really, if you just wanted to have a conversation about entropy and you wanted to throw in all probability distributions of any length, you actually have a collection of functions, each of which is called entropy.
1:07:53.7 TB: I have a function out of the 0-simplex, that’s the entropy of the boring probability distribution one, but I also have a function out of the 1-simplex, that’s entropy, a function out of the 2-simplex that’s entropy and so forth. So Shannon entropy really says, hey, first, pick a simplex. First pick the number N, then ask for the entropy of a point in that simplex.
1:08:17.1 SC: And you mentioned, sure, that the 2-simplex is a little part of plane and therefore there’s some topology going in there, but there’s more topology than that, like the… Like I said, no one ever told me that a simplex is a probability distribution or a set of probability distributions, but the reason why they came up is because we were learning about topology, I think homology in particular. Is it feasible to explain how this concept of a simplex helped us understand the topological structure of complicated spaces?
1:08:50.0 TB: Ah, I see, yes. Oh, are you thinking of maybe looking at a topological space trying to… A general, let’s just forget about entropy for a moment. A general topological space, trying to understand something of that complicated, messy, who knows what it is, and then mathematicians have this technique of imagining it’s built up from little pieces called simplices. Before I answer the question, is that kind of the…
1:09:20.7 SC: That is exactly it. And sort of figuring out how many holes there are in how many donuts, etcetera.
1:09:26.2 TB: Yes, yes, yes, yes. Okay, right. So this gets into the ideas of homology and cohomology, so this is exactly right. You may… Okay, so topology, I think for viewers, we’ve used it several times now, so maybe folks know what it is, but I like to think of it as like this squishy version of geometry. To a topologist, a circle and a triangle is the same thing, because if a triangle is made out of a wet noodle and you kind of squish in the corners, you can get a circle. So in topology, this kind of squishiness is allowed, but if that’s allowed, then what are we doing? This goes back to the idea of when are two things the same, which we talked about in category theory a while ago, like when are two topological spaces the same or not?
1:10:13.0 TB: And one thing that you can use to distinguish spaces is by counting how many fundamental holes do they have. A circle and a triangle are kind of the same, you can see that in they have holes. If I hold a little key ring, I can stick my finger through it, that’s kind of the hole. But not all topological spaces can we see with our real… With our eyes. Not… There are other spaces that aren’t key rings and they’re very abstract and they’re in some textbook, buried deep in the pages.
[laughter]
1:10:45.1 TB: And how can I understand that object?
1:10:48.3 SC: Yeah.
1:10:49.3 TB: I can only visualise it kind of in my mind based on some equations or something, and I can’t really hold it in my hands, but how can I still investigate it? And this is where simplices are very helpful. You can imagine kind of probing your space with maybe simplices. So for example, think of just the surface of something like a donut or a torus. Maybe I can think of triangulating that surface. Maybe thinking of the entire exterior of a donut is very difficult, but if I kind of imagine a triangular-shaped grid, I can maybe understand something about the little pieces. So these are…
1:11:30.2 SC: Yeah, so… Sorry. So in other words, if you have some wiggly thing, topological logical space of whatever number of dimensions. In principle, a wiggly thing requires an infinite amount of information to specify.
1:11:45.4 TB: Yes.
1:11:45.9 SC: But if we just chunk it up, if we tessellate it, we can keep the topology the same while reducing that infinitive information to a finite number of sort of what triangle is connected to what, or what tetrahedron is connected to what.
1:11:57.2 TB: Exactly, exactly. It’s a finite number of things, and it’s also a combinatorial thing. Oh, a 2-simplex has these many edges and then it’s connected to this thing with these many edges and maybe there are some intersections or something. So you’ve taken something that’s kind of complicated and yeah, wiggly and just, what do I do with this? But if you can break it up into smaller pieces that are maybe bite size, easier to handle, maybe combinatorial, you can start to count things, that makes it more manageable. And then, yeah, then there’s this leap. You have to bring in some kind of tools to make sense of what you mean by a hole. I know what that means in a key ring, but what if I have some abstract space? What is a mathematician’s definition of a hole in a topological space? And what if I am in 17 dimensions? How in the world do I make sense of that? And that’s kind of where these simplicial tools are very useful to answer these kind of questions.
1:12:51.9 SC: Yes, and somehow this is going to connect to derivations and operads, and then I can try to find a thread that gets us there, but I think that would be dumb when I could just ask you to do it for us. [laughter]
1:13:06.2 TB: Yeah, so let me give the lite version.
1:13:12.9 SC: Please.
1:13:13.1 TB: L-I-T-E, and then depending on…
1:13:15.1 SC: We can decide how deep, how heavy we want to go. Yeah.
1:13:15.8 TB: Which directions we want to go, we can get deeper. But let me… Yeah, there are a lot of words, simplices, derivations… I don’t know, what’s happening? Operads… Okay, so here’s the easiest way that I can say this. Let’s think back to Shannon entropy, the amount of kind of surprised or information contained in a probability distribution. It turns out, and it’s not too… It’s not hard at all to show that entropy fits into a formula that looks eerily similar to something you might see in different places across the mathematical landscape. And that formula kind of hints that entropy behaves like a derivation. What’s a derivation? We’ll think back to calculus and the Leibniz rule from calculus.
1:14:08.4 SC: The product rule, yeah.
1:14:08.9 TB: I think we all learned this a long time ago. If I have two functions, F and G, what’s the derivative of F times G? And then we all learn about this product rule, oh, it’s the derivative of F times G plus F times the derivative of G.” Okay, the Leibniz rule. Well, it turns out that entropy satisfies an equation that looks very much like this, but kind of not really, but you can see this formula and you think to yourself, oh, something’s going on.
[laughter]
1:14:38.1 TB: So long story short, my paper is taking that kind of intuitive “something’s going on here” and making it very precise and saying there is a very precise way in which entropy satisfies the Leibniz rule and moreover, any time you have a function in a certain context, I have to be very precise, that satisfies the Leibniz rule, it’s basically entropy. You have to… A constant multiple or something.
1:15:05.1 SC: Oh, okay. Good. Yeah, this is the kind of thing that gets mathematicians very excited where they show that something is something else.
1:15:10.7 TB: Yeah, exactly, exactly.
1:15:12.8 SC: And this is related to… That was a very nice lite derivation, good, or explanation. One of the elements that comes in here that I think is graspable is the idea that the boundary of a boundary is zero. This very famous deep topological fact, like if you take a disc, its boundary is a circle, but the circle doesn’t have any boundary, it’s boundary-less. And the way that a fancy differential topologist would express that is as D squared equals zero, where this D is the boundary operator. So am I reading too much into the letter D [chuckle] or is this relating D to a derivative and therefore now to entropy?
1:16:00.1 TB: So at the moment, we both might be reading too much into it. So in the paper, I unfortunately do not have the opportunity to say that the D that you see there, if you apply it twice, you get zero. I think, though, that would be very exciting. And I think that a paper like this suggests that it should be investigated more for this very reason, because there is a precise sense in which you think about derivatives in the sense of calculus, derivations and these boundary operators, things that satisfy D squared equal zero, go hand in hand. If folks want to look up something, they can look up de Rham cohomology, since we were talking about cohomology earlier, and see that these ideas are mixing together very intimately and beautifully in a nice way.
1:16:53.1 TB: Now, I didn’t get so far there. It didn’t get to that point yet, but I think that it suggests that there’s something deeper to look in that direction. There have been other results in the past few years by other mathematicians involving other characterizations of entropy, that also swirl around this D squared equals zero idea. I think that that’s very peculiar, I mean, I’m talking in the last four or five years or so, so the fact that entropy is kind of coming up in these topological and algebraic areas independently might suggest that there is indeed something going on there.
1:17:38.4 SC: And is it possible that there’s some sort of duality relationship, because in the word of… In the world of homology and cohomology, our audience knows what neither one of these words mean, but the idea is that these are two different ways of topologically characterizing spaces, one of which sort of looks at the structure of spaces, and the other one looks at the functions you can put on spaces, and it turns out that these very different sounding words are in fact sort of mirror images of each other in some particular way, and I’m wondering if maybe that is what is going on with entropy versus… I’m not sure what entropy would be dual to. But somehow these simplicial structures, I don’t know.
1:18:21.6 TB: Yeah, so I would say that this is very active research and something I am thinking about daily these days.
[laughter]
1:18:31.0 TB: So maybe come back in a year, and I’ll have really great answers for that, but that’s exactly where I’m headed next.
1:18:37.3 SC: And it’s only possible because of the psychological difference between physicists and mathematicians, because the physicists will say, well, look, what do you mean, entropy? It’s not… It’s a number, like you give me a distribution, I give you a number. There’s really nothing more to say, but the mathematician will say, with the boundary operation, the boundary of an N-dimensional thing is an N minus one dimensional thing, and so the physicist just stops there, but the mathematician says, let’s consider the combination of all N dimensional things and N minus one-dimensional things and N minus two-dimensional thing as like one giant thing, and the physicist says you’ve lost me, you’re like… You’ve gone crazy. But when you make that leap, you’re now able to invent operators and relationships and things like that, and maybe discover things about entropy 150 years later that the physicists never found out.
1:19:32.5 TB: Yes, yes, that’s exactly right. And this reminds me a little bit of the flavor of the paper that I wrote. I mentioned that there is a formula, an equation, that entropy satisfies that kind of got me into the project, oh, that looks kind of like a derivation. Can we make that precise? What’s interesting is that the paper kind of develops a more general framework for which that equation is a special case.
1:20:00.3 SC: Good.
1:20:01.4 TB: So it kind of says, yeah, that’s just a shadow of something much richer going on, there’s something kind of hovering overhead, above head for which that one equation that we kind of all know… We see in our favorite information theory textbook or something, it’s just an example of something richer, maybe, that’s going on, that’s pulling from these tools that maybe you didn’t know they were there, like just look up, oh, they’re floating on the ceiling or something. Ah, we just didn’t know. So I think, yeah, it might feel complicated to say, wait a second, we started with probability distributions on N things, and now you’re like, consider N minus 1 and N minus 2, and then maps between them and D squared equals zero, what?
1:20:41.6 TB: But actually, when you do all this, it might seem like heavy lifting at first, but then you’re quite rewarded at the end because it gives you a different vantage point for which a lot of things may fall out nicely at the end.
1:20:56.9 SC: And I can’t think of a better place to end our conversation at exactly that, hoping that everything falls out nicely from the end. So Tai-Danae Bradley, thanks so much for being on the Mindscape Podcast.
1:21:06.2 TB: Thank you, Sean. It was great to be here.
[music][/accordion-item][/accordion]
Intriguing interview, although I’m pretty sure many of the concepts discussed seem somewhat strange and hard to follow, especially for those of us who are not mathematicians nor physicists. I found that the short video posted below “What is information theory?” simplifies things somewhat.
https://www.khanacademy.org/computing/computer-science/informationtheory/info-theory/v/intro-information-theory
“Entropy may be a differential operator with a cohomology chain, and likely with a dual homology chain but we have no idea what that is yet,” is not something I expected to hear today … if that’s not worth a review I don’t know what would be, heading to Apple Podcasts now to drop some stars (and followed by arXiv)
In physics and mechanics entropy is represented as a thermodynamic quantity representing the unavailability of a system’s thermal energy for conversion into mechanical work, often interpreted as the degree of disorder or randomness in the system. In information theory entropy is a logarithmic measure of the rate of transfer of information in a particular message or language. The interview with Tai-Danae Bradley was mostly focused on the information theory aspects of entropy. The video posted below “A better description of entropy” is more focused on the physical aspects of entropy, and what entropy has to do with available sources of energy, and the eventual ‘heat death of the Universe’ itself.
https://www.youtube.com/watch?v=w2iTCm0xpDc
I have been thinking too about words as tools to cute the universe.
And so when listening to people, i hear where they see from. Like a longitude and latitude.
Thank you
Thank you for giving Tai-Danae Bradley a platform. Only by highlighting Bradley’s work will we inspire others like her to explore their prowess.