181 | Peter Dodds on Quantifying the Shape of Stories

A good story takes you on an emotional journey, with ups and downs along the way. Thanks to science, we can quantify that. Peter Dodds works on understanding the structure of stories and other strings of words (including Twitter) by analyzing the valence of individual words, then studying how they are strung together in different kinds of stories. Understanding these structures offers powerful insight into how people communicate and how to reach them. As Peter says, "Never bring statistics to a story fight."

Support Mindscape on Patreon.

Peter Dodds received his Ph.D. in mathematics from the Massachusetts Institute of Technology. He is currently a professor of computer science at the University of Vermont and Director of the Vermont Complex Systems Center. He has won multiple teaching awards, and was elected a Fellow of the Network Science Society.

0:00:00.0 Sean Carroll: Hello everyone. Welcome to the Mindscape Podcast. I'm your host Sean Carroll. For today, we're gonna go into big data and storytelling. So I remember once hearing a conversation, or maybe reading about it online, where someone was complaining, the usual complaint, about Hollywood movies. How they're sort of low brow and predictable but frustratingly popular compared to more intellectual fare. And someone else gave the counter-argument, saying, "That's because Hollywood does what Aristotle told storytellers to do 2,500 years ago, to have a structure, a three-act structure, denouement, conflict, the whole bit." And there's something to that. On the one hand, there's plenty of different kinds of stories. They don't all follow the Hollywood three-act structure. On the other hand, it can... Meanwhile, I should say, it can be a little bit predictable. But it nevertheless works. There's something about that kind of story that hooks us, that carries us along. And there are other stories that carry us along in different ways, and other people are fans of those.

0:01:02.8 SC: So given this feeling that stories, which are these quintessentially human artifacts. Given this idea that they have structure somewhere, wouldn't it be great to do science to that idea. To try to tease out using data, using some kind of collection of information, whether or not real world stories, the stories that we like to listen to and the stories that we tell both spontaneously and from great planning, have different kinds of structure of this form. So that's what today's guest does. Peter Dodds is a statistician at the University of Vermont who studies big data kinds of problems in many different contexts, from Earth Sciences to language to ecology. But he's one of the heads of what is called the Computational Story Lab. And what they do is they consider individual words and they rank them. They rank these different words, or they have people rank them, in all sorts of different ways, different valences, for are the words happy or sad, are they strong or weak, etcetera. And then they ask how important are these different rankings. How much correlation is there between different kinds of axes upon which different words have these values? And they try to seek out using math, what are the most important aspects that words can have playing a role in a story?

0:02:23.2 SC: It turns out there's a two-dimensional framework, which is very nice. Words go from a spectrum of weak to powerful and also from safe to dangerous. Those are the two aspects. Those are two axes that matter the most for the impact that words have in stories. And then you can plot real stories, whether it's novels or screenplays. For that matter, you can plot things like the emotional state of the world by looking at Twitter or looking at other social media. I'm not gonna give away all of the answers here, but Peter does a good job of explaining how stories do have structure. It's not just our imagination. We're not just imprinting structure on it from inside ourselves. There's a real sense in which successful stories have a certain kind of flow. And it's fascinating to look at why people respond to the stories in different ways, which you can look at on Twitter. What kinds of events are causing people to be happy or sad, to take refuge in words that are powerful or dangerous or weak or safe or whatever. So this is very early days, I think, for this kind of work. It's very difficult. Language, humanity, meaning, it's all there. But we're beginning to have these big datasets that let us ask these questions in really new ways. So it's going to be exciting to see what comes out of this kind of work. So let's go.

[music]

0:04:00.8 SC: Peter Dodds, welcome to the Mindscape Podcast.

0:04:01.5 Peter Dodds: Thanks. It's great to be here.

0:04:03.4 SC: I think a good starting point for this... 'cause as we just said seconds ago, before we started recording, there's a lot to cover. But I love your invocation of the famous Kurt Vonnegut lecture about the shapes of stories. And in some sense, you're taking that idea, the shapes of stories, and quantifying it, like being a good scientist, using the big data techniques to nail down some numbers. Is that more or less an accurate... It's a partial thing that you do, of course, but is that an accurate way of saying one of the things you're aiming to do?

0:04:35.6 PD: Yeah. I have this kind of layout for basic science. And there are sort of two pieces, fundamental pieces, which are describe and explained just to sort of make it really simple. And I do that because I think it helps students understand which part they're acting on. But coming into that is taste and what do you choose to work on, what's meaningful. And that's hard, right?

0:05:00.2 SC: Oh yeah.

0:05:00.5 PD: But it really matters tremendously. And sometimes you're sort of a bit nervous for maybe years about whether things will matter. It's the game. But stories, to me, have become more and more sort of foremost in my mind of just this incredibly important aspect of being a human and how cultures work and so on. And I know many different fields, and just people in general, easily have come to that. Religions, politics, it's all there. I came to it from social contagion, trying to think about how things spread. And that was all very sort of simplistic models. Do you sort of wear UGG Boots or not wear UGG Boots? Or, do you wear a funny hat or not a funny hat? Or, perhaps take on a political belief or not. But it was all sort of physics-ish sort of models, simple things. And they tell important stories about systems. And sort of out of that, eventually over many years, started to sort of think more and more about the deeper things that people might run around with, which are stories. And they can range from very simple proverb type stories. The US has rags to riches. The American dream is a really fundamental kind of story. So how do you then start to measure those things? I do come from a physics background.

0:06:19.2 SC: Good.

[laughter]

0:06:22.4 PD: Well, [chuckle] good and bad.

0:06:24.8 SC: Here, it's good, yeah.

[chuckle]

0:06:26.5 PD: So start Med kind of stuff. If you look back through physics, through thousands of years, we've had some pretty crazy ideas about how things worked. And that's how science has to progress. But measurement, just drove everything eventually, if you think about... One of my examples that I often put out is temperature. Measuring temperature, which we take for granted now. But that took, well, in the last 500 years, hundreds of years, to get to a point where people are like, "Oh, actually, you can do that." We were pretty happy with measuring distance, measuring time, really hard.

0:07:02.0 SC: Yeah, very hard. Yeah.

0:07:02.5 PD: Really hard to measure time well. Yeah, amazing. It was sundials for a long, long time. And time is a big piece in some of the work we've done recently too, how you experience it. That's a bit long, but I guess with the big data kind of revolution, and we call it big data, because it's about people. We've had big data in many fields, is there's this blue collar kind of honest hard work that we have to go back to, which is just, let's really, really look at this stuff and measure it and quantify it. And maybe we had a pretty good time into the '80s and '90s of making simple models and telling all these beautiful stories about the world, but they were gloriously free of data. Which if you have a beautiful idea that way, don't go and look at reality. Because you might be sad.

0:07:56.3 SC: It's in the way.

0:07:57.8 PD: Yeah, and of course string theory, we've got some beautiful examples in physics still. Although that's beautifully done because you can't really [chuckle] sort it out. So, that I feel it's just almost just being responsible, we're just trying to measure things well. We've got these hard problems, let's see what we can do, and things have changed tremendously the last 10 to 20 years. Alright, so Vani Gad. I think I came across this YouTube video of Vani Gad talking about it, it's probably how I came across it first, and I showed it to my students. I'm like "We should be able to do this." This fits in with work that we've been doing for many years before, which was measuring emotional states of populations and...

0:08:41.0 SC: Some people in the audience might not actually be familiar with the video, so maybe you remind us what Vani Gad is actually saying there.

0:08:46.6 PD: Yeah, so it's sort of a five-minute version. 'cause I think he told the story in many places. It's really quite charming. And so he sort of lines up a graph and it's sort of ill fortune, good fortune on the vertical axis, good fortune at the top and then time. So time is going to the right there. And then marks out a simple graph, and it just sort of starts high and then goes down and comes back up again like a little wave. And then says, this is what he called a man and a whole story. So this is many sitcoms, many stories kind of just work like this. They start off, things go wrong, they get back to where they were. And his little sort of line there was people love that story. They love it. There's nothing about plot in here, and I wanna be really clear about it. This is just the overall emotional arc. It gets a bit conflated with plots, and that's a much deeper, harder thing that we're trying to work on as well.

0:09:48.7 PD: So emotional arc. So you think, "Alright, well, maybe we can do this, and the work that we had sitting around, that we built for a long time was this idea of what we called a hedonometer. So measuring happiness, but equally sadness, I should point out. And that came out of older work from the maybe 60,70 years. Now, I think of trying to measure the fundamental dimensions of meaning. And this to me is really, really... Actually, this is the most exciting thing I've ever worked on. The more recent stuff, about that, and we'll get to it. Yeah, and just thrillingly incredible. But the idea is, okay, if I can kind of expound on this.

0:10:35.1 SC: Please.

0:10:36.1 PD: Let's give people... So trees, cars, your life, we have all these aspects of meaning associated with them, how you feel about something and feeling and meaning allied in interesting ways. How do you sort of boil that down? Maybe you could look at a dictionary or a thesaurus, and you've got this rich space of meaning and the more recent work that we have in deep learning and so on as, here are 300 dimensions of meaning. And it's like, "Woah, what could could go wrong with it." [laughter] So what are the absolute other end of that? Which is, what's the absolutely most essential aspect of meaning? And what was sort of dug out over decades, and through of course, initially small scale studies with people, obviously students in psychology [0:11:26.8] ____. But here was the idea, okay, we'll give you a bunch of objects or concepts of whatever, and you have to just assess them on semantic differentials, and we'll give you a bunch of these. And so, they are things like hard to soft, good to bad, big to small, all these kind of very natural things that we're fairly comfortable with them being antonyms. They represent opposite ends of some spectrum.

0:11:55.3 PD: And so this was done, as I said, in the '40s and '50s. And the first big work was actually for pings from submarines, which is quite charming. It's really interesting work, and it's sub-handlers. People with radar, how did they feel about the sounds, did it kind of danger energy? What did it mean to them? So that kind of spread out from there into thinking about meaning of anything. And what was sort of boiled down over many years was this idea of valance being dominant, and it's a nicely inscrutable word, but it does general... And I think that's not un-useful, but it means good to bad, basically. So happy to sad, so collapsing a lot of things. And so you can imagine from an evolutionary point of view, like sort of a survival point of view, you're an organism, you have a sense of what's good and helpful and positive and negative, and you're attracted to one end and you're repelled from the other. So it had this very sort of fundamental aspect to it. There are a couple of dimensions that came out, and the tricky thing is you've started with hard and soft.

0:13:05.8 PD: Light and heavy. You've started with all these very sensible ones and you have to figure out then, because what's really going on, we're solving an SVD type problem, a linear algebra problem. What's the common...

0:13:15.3 SC: You have to explain what SVD means.

0:13:18.5 PD: Yeah, so a Singular Value Decomposition. What you're trying to figure out is if you've got all of these axis, these semantic differentials. If we sort of take the right point of view, it may be that there's some way of adding them up and subtracting some from the others to get a really fundamental kind of dimension. You might see that this shape in front of you. So words have points in this space, you can imagine words or things, but let's talk about words. So I'm gonna present you with a word, football or chicken and you have to rate it on all of these different semantic differentials. So then these words have a point in this space of semantic differentials and then the idea is we'll rotate that space around and play with it a little bit and maybe we see, oh, it's kind of really dominant in these ways and say that this valence dimension, it's a sum of all of these things in some complicated way. But maybe the good, bad semantic differential probably lines up with it and so it may have nothing to do with...

0:14:23.8 SC: You're taking all these words and you have many different possible axis along which your students I guess or subjects are rating them. But some just correlate exactly with others, and so that's kind of redundant information and you're looking for what are the axis, if you like, that matter the most, is that a fair way of saying it?

0:14:44.3 PD: That's right. It isn't any one of those semantic differentials that you started with necessarily at someway. You have to go through that and figure out, okay, these ones are kinda... It's not like mass and length and those sorts of things. Now, we're dealing with categorical. You have to sit down as a human and really kind of think through this. So that's right, and the work that we did initially to get this hedonometer stuff to work was essentially, actually many years ago. I'm trying to figure out. Okay, we've got all this data coming through, blogs, it was a little bit before Twitter and Facebook really took off but we looked at some other things like State of The Union speeches, there's hundreds of years there, music lyrics for which we had 60, 70 years. So we're trying to get hold of different kinds of text. So text is data that represented some aspect of human behaviour. None of these things are complete, of course, we wouldn't wanna say that. But we thought well, we've got the stream of say, words coming through in real time, can we figure out is this population that's expressing it happy or sad or are they fearful or less fearful and partly inspired by some of the things that were coming out of economists around that time. Greenspan, 2007 and '08 said he would throw out all of these mathematical models if he could figure out why people are becoming more [chuckle] euphoric or fearful.

0:16:16.0 SC: Yeah.

0:16:17.3 PD: People could probably find that interview, it's on the Jon Stewart. It's on the Daily Show from a long time ago and it's quite remarkable. It's before the housing crash too. So I would carry that around as a good example like that seems a really basic thing to know. And of course, we wanna put it up against something like GDP. Sure, the stock market's up but are people happier or sadder? And it goes back to measurement. If you want to improve things, I think we're in this kind of really difficult time where we can measure some big complicated things quite well especially money, or at least we think we can, but we're leaving out these other mushier, harder pieces to measure. And as a result of course, you try to maximise or optimise something that's... And you're not measuring everything. I think people understand that but you sort of also forget it. You look at the things that have a charts. Look, here's the stock market, you can look at it.

0:17:09.6 SC: Yeah, you look under the lamp post.

0:17:12.3 PD: Yeah, so that was part of our challenge. I felt it was a fundamental thing about people who we're trying to measure as populations. We're not trying to track individuals, it's nothing that we would say, "Oh, you said this sentence, you're happy or sad," it has to be from many, many, many, many words. So it's more like a physics where you're averaging over lots of pieces. So it kind of has an inbuilt privacy thing, if you like. We eventually created something online which is @hedonometer.org and it takes Twitter data. And that give us sort of the band I think is Twitter. And you can see over many years now, 13, 14 years, this sort of long arc of what's... Twitter's a complicated thing, people have changed, who's actually active on it? I think we have 10 languages, Russian, Korean, that's it, it's a whole thing. But it's exactly this kind of index, if you like. What's the Dow Jones index of happiness? And it has some big patterns, it's been going down actually for five or six years, but more recently it's been kind of going up this...

0:18:16.8 SC: Sorry, the happiness has been going down?

0:18:19.3 PD: Yeah, since about 2015, yeah.

0:18:21.2 SC: Huh, weird.

0:18:22.3 PD: Yeah, going down, but the last year it's been sort of slowly going up. 2020 was the first time we saw anything that I would call collective trauma. And of course, there's your own personal view of things and that's what we're trying to take out, if it's like what we think about this, we're trying to get the sense of a population. And yeah, your listeners will have all of their own specific kind of feelings of how things have... Maybe 2014 was the worst year personally, but we're trying to get out the whole picture. And by collective trauma, what I mean is the advent of the world kind of understanding there was a pandemic, we sort of knew in January 2020 that there were dangerous things afoot but it wasn't really till, I think it's March 12th when the NBA suspended its season and Tom Hanks said he had COVID, all these things happened in about 10 minute. And President Trump, at the time, gave a speech sort of saying for the first time that things weren't great. And the stock market, of course, the stock market started to tank straight away.

0:19:32.1 PD: So that was a big drop, and it also did... What we'd seen in the past is there were these big drops for deaths of celebrities, terrorist attacks, school shootings, these things that occupies, but then they're really quickly being wiped out by stories. People still talked about those things, but there's just this flood of stories all the time of everything that's happening in the world. So there would be drops, but they kind of come straight back up maybe a couple of days. But it took on the order of months really for Twitter to sort of rebound back up to its kind of normal level at the time, which is pretty low. And then George Floyd's murder was a huge drop, but it kept dropping as the protest built over the next few days because of people understanding what had happened and being out expressing their feelings online or what we've measured as feelings. That's the lowest drop we've ever seen. And again, it took a long time to come out of. January 6th was another big drop actually, that's probably the third lowest over the whole time.

0:20:41.2 PD: Many things have kind of come out of all that, we can measure happiness of texts in lots of ways, until it finally got back to Vonnegut. What we did was we went to books and we said, "Alright, let's see what it could... " This is this idea of Vonnegut's. He says, "This is so simple, even computers could do it." This is maybe 1990, '95 when he was saying these things. And we thought, "Alright, we can probably do it." [chuckle] And in fact, it turns out that I think in maybe, when was it? '60s or '70s. I think it was the University of Chicago, he wanted to do this as a Master's thesis. He had presented it and they said no, and he was still mad about that for decades and decades and decades. You can find him talking about how upsetting that was to him. So it's sort of an homage to him in some ways. But we got a bunch of books, maybe I think 20,000. You have to sort out what's fiction, and it's a bit of a mess. But basically created this same hedonometer idea, but in this case you're now sliding through the book. So you're gonna say, "Okay, the first thousand words have this score, and then we'll slide this little window."

0:21:52.9 SC: Okay.

0:21:53.3 PD: And we're not reading it like a person, it's sometimes called a bag-of-words method, you're just gonna whoop, put them all together and slide, and get a score for them. So not all words we have scores for, and some words, we say the score is unimportant. Like the word "the" is a neutral word. We ask people what they think of that word. People did with psychology students of course early on, eventually it's online, you do it with Mechanical Turk, which is an Amazon service where you ask people what they think about things, or you can use a tool sort of things. But so the scale of these studies is now really quite large. Anyway, so we have scores for words.

0:22:41.7 SC: Yeah, so you sort of separately score the individual words and now you're taking novels or what have you, works of fiction, and scoring, as you say, sections of those as you go through the text, and so you can see the happiness or the sadness go up and down as you read through the text?

0:23:01.0 PD: Right. And you play around with the window size and you think about this. We did it for movie scripts as well. Scripts are useful, they have descriptions of what's going on, so they're actually somewhat rich. You can't get the final one, which I realized as we were doing this, because I was looking at Alien and I was looking through the script and Ripley is a man in the one, or it might be the fourth or last script version of that. Anyway, so you've got some version of it and you do what you can. So if you look at something like Predator, starts okay, and then just goes to it's terrible. It's just negative and drops. There's no sort of ups and downs, which we're more familiar with stories like Harry Potter, is it the Deathly Hallows, the last book? Really huge ups and downs as it goes through. We're trying to figure out is there current sort of characteristic scales of fiction.

0:24:04.5 PD: But what came out of that? And we detect in various ways, but there are sort of six fundamental shapes, if you like. And there was a rags to riches one. So very simple, basically sort of goes up throughout the book, may have some ups and downs, but that's sort of a... It was just kind of like decomposing a sound into this fury of waves, or whatever you make. It's a bit like that. And I wanna add something that's much more complicated though. But this is of course, we're looking at emotional arc, so we do have signals. There's the tragedy, where things just keep going down. So metamorphosis maybe, Kafka. It starts off badly. You're a cockroach, you start with them, and then it keeps going down. And then there's the man in the hole type one of Vonnegut. There's the inverse of that, which we called Icarus. So it starts, things go really well, and then they go really bad. And then we had two others which were Cinderella and Oedipus. So Cinderella starts low, goes high, you've come to the ball with this fairy godmother stand up, and then things go badly again. And then it just was a huge rise. And that's one of Vonnegut's favorite love stories that he talks about. Cinderella thing, this pattern. So it's a simple going down, up, down, up. And the flip of that, we called that Oedipus. Starts well, things go bad, then you kill your father and marry your mother. It ends poorly.

[laughter]

0:25:28.8 SC: Because sadly we don't have the visuals here for the audience, but this is...

0:25:33.8 PD: Yeah, I'm trying to... Yeah.

0:25:35.0 SC: I saw your plots though, the visuals are great, and plots in the sense of graphs, not plots in the sense of story structures. What fraction of stories fit into these? 'cause it's a very simple kind of ex post facto natural thing. There's sort of the stories that have no maxima or minima in the revolution, it's either rags to riches or tragedy, and then there's stories with one maximum or minimum, and there's stories with two maxima or minima, that basic arc. Are those six possibilities... What fraction of story is covered by that?

0:26:14.5 PD: It's, again, one of these things where it's like 90, 95%.

0:26:17.8 SC: It's amazing, yeah.

0:26:19.0 PD: But of this particular pool of books and this set of works. So, I think the future of this, of course, is to curate things really well, like here are detective stories, here are stories from this particular culture, and so on. And we found this with the Hedonometer work in general. If you estimate the happiness of a set of words, you might say, "Okay, maybe I can get an error measure for that." This is a very typical thing to do with measurement. But it turns out it was completely in the lens. It's completely in the list of words for which you have scores. So if you change that list of words by scoring more or taking some out, that's where the error is. It's all in the instrument. So in this case, yeah, it's one of these things where we seem to have a big dataset. We have 20,000 books. That's a hard thing to read, right? So this gets beyond it. It's important. No one's gonna read 50 million tweets a day. And so, what we're trying to do is what I've sort of called telenomics, which is distant measure sensing of knowledge so far into genomics, far knowledge. And because yeah, there's no way an individual can do that. And we wanna get some sense of the whole thing sort of streaming. If all these tweets stream past you in three seconds, how would you feel? Pretty bad, probably, just in general. [laughter] But taking that part out, is it better or worse than yesterday?

0:27:51.4 PD: So the man in the hole one, which is this favourite one of Vonnegut's. So I would say that the framing of that is not great, actually. Because he's sitting there, he has a drawing, we're struggling here with a podcast. But he has a drawing, so you can kind of see it in front of you and it all makes sense. But man in a hole doesn't tell you a sense of time. It doesn't give you an arrow. So metamorphosis could be man in a deepening hole, as it turns out. But person in a hole, it doesn't tell you that they start, okay, they get into the hole, and they get out. And I guess I think a lot about ads and slogans and so on. And it struck me before the 2016 election that Make America Great Again was the man in a hole arc. In four words, it indicates something about the past, the present, and the future, which is really powerful. And it's, as I understand, it's 1980, I think, it's Reagan and Bush. It was used in ads then like, "Let's make America great again." It was used in posters and so on. It wasn't quite the dominant slogan. But it's one of those ones that's really powerful. Bill Clinton used it. Lots of people have used it over the years in various ways because it is very powerful. And I think that as a rhetorical, as a story in four words, super powerful.

0:29:11.8 SC: And do you find that there are, you alluded to this a little bit, but relationships between these different kinds of story arcs, or valence arcs, whatever you wanna call them, and genre or literary-ness of the fiction? Are there certain kinds of... Do you get high-brow fiction using one kind of pattern and pot boilers using another one?

0:29:32.2 PD: We are working on that more now. We have some work where we're looking at things like accounting textbooks and manuals for televisions, and just like what... 'cause you wanna know, are we getting something artificial? It certainly, if you randomly shuffle text, it doesn't produce these shapes as you might hope. So there's sort of a... We can at least get that sorted out. But again, that's a curation of data that I think we're still behind on. We're trying to build. Well, we do have this thing called Story Wrangler. It's at storywrangling.org. And it's for Twitter at the moment, but the idea is to kind of house all of these different bodies of work and have a time series for the usage of words within them. So that, hopefully, eventually will be something that could kind of go towards what you're saying.

0:30:28.8 PD: We do, of course, have Google Books, which has been around for about 10 years now. The problem with that, is that it doesn't have enough metadata. You can't really sort out the sort of broadly fiction, broadly everything. And as it turns out, we did some work on it, and we figured out that actually the kind of collective English stuff is full of science. There's a lot of medical and science-type writing. And the 20th century is basically dominated by the sort of rise of science. And you can see it in little details, like figure with a capital F. It just goes up. And et al and all the things to do with data really, actually, about exponential growth of science, which is this sort of understood, I suppose, in the '60s to sell a price, presumably armed with a million graduate students, went through libraries figuring out what the memory was in journals and how much stuff was being published. Anyway, that's imprinted in there in a way that we can't undo.

0:31:28.8 SC: Well, it makes sense. I think I noticed on your web page that the most commonly used word on Twitter is RT, the abbreviation for retweet.

0:31:37.3 PD: Yes.

[laughter]

0:31:39.3 SC: That doesn't really mean it's the most commonly used in English. But on that particular medium, that's what pops out.

0:31:44.8 PD: And you have to say, "What are you looking at?" Twitter is interesting because it does kind of encode so much. And the news is, for sure, there. Another way to look at all of this is to think about forests. So we have a forest, and you would like to know all the species in the forest, which is actually, of course, very hard to measure, and have accounts for them. How many are there of all these different species? So this is this. It comes out of linguistics with the types and tokens distinction. What's your lexicon? Which would be for language. Here's your list of words. And then here's your list of all the animals and organisms. And then you have, next to it, the counts. But then you wanna do that over time. So maybe for forests, it's at the scale of a year. There are studies that do this for small parts of forests. We're sort of looking at forests of words and stories and trying to see how they change over time. Of course, they can change dramatically.

0:32:43.0 SC: I don't wanna lose track of this other thing that we mentioned, and then we sort of buried it in the happiness versus sadness discussion. But there is this multi-dimensional way of thinking about the words. And you've done your factor analysis to try to figure out what dimensions matter the most. And why don't you tell us what those dimensions are that matter the most.

0:33:08.0 PD: Yeah, I actually just wrote that down. I'm really excited about it. It's still in review, so we'll see what happens, but we're pretty confident about all of this. Alright, so we had valence and when we sort of saw that at the time in the literature that this is the dominant axis. And certainly when you look at the data back then. And I think we're looking at data sets that had 1,000 words with scores associated with them. So that's not a big set of words. People's vocabularies of tens of thousands. Something like Twitter with all its misspellings, is hundreds of thousands.

[laughter]

0:33:46.6 PD: So you want something on that order. And over time what has happened, of course, they've been the bigger studies done, and done in slightly different ways. And so we've gotten just as you might hope in science, more accurate, richer works. Back then, 12, 13 years ago, the main idea of what was going on was there was valence, which was this happy, sad, good, bad kind of axis and there were a couple others though. One was about dominance, do you feel in control or not in control when you kind of consider something. And there was another one which is activity. It's got various names for it, but basically kind of activation. Is this exciting or boring? So there have been these other sort of secondary dimensions that people had floating around. And then sort of debates about which one's matter. So we've got this work from... We didn't do this study, but a couple of years ago it was worked by Muhammad in Canada. Again, online, many, many people doing these evaluations, and now we've got 20,000. Now, it's 20,000 words.

0:34:54.4 PD: So it's a huge jump from, say, 1,000 and work that we did would go to 10,000. And they're mostly good kinds of words, they don't have people's names, or events, which can be a bit of an issue with some of these large sets of words. Alright, so the idea was. The people, were going to evaluate these on Valence and what was called a arousal, which is the activation one on dominance. So you're given these three dimensions. And it is tricky. How do you present that to people? So you kind of have to give them kind of clouds of words at each end. So they kind of know that the positive end of valence is you feel good, you feel happy, you feel maybe comforted. It's a bit spread out. But that's fine.

0:35:39.7 PD: Everyone's given the same instructions. But looking harder at this stuff and then again, doing this kind of factor analysis, you can see you've got this kind of notes, the three-dimensional space. Maybe it doesn't come back like in the dimensions you've actually tried to impose, that you've tried to say to people. We think these are the fundamental dimensions. That's good, but you can't see what they were actually think. Maybe they correlated some of those. And that actually turns out to be the case. And if you sort of rotate this football and kind of squeeze some of the axis and pull some of them apart, you get another shape. Well, we played around with it for a while. But it has this two main axis. And the one going across, if you look horizontally, is power-weak. So power over here is like, the people will be success triumph if you sort of go back to people. So it's kind of winning. And then the weak end of that is void and nothing. So it's not failure, its just emptiness. So that's going across the page. And then up is, pointing up, and this is our choice, danger. So this like a compass for basic meaning. And so, danger is up and safety is down. And we call this, we make up words, but Ousiametry.

[laughter]

0:37:01.1 PD: Ousia, we're taking it to mean essence, it's a Greek word. But it is where the word essence comes from. So O-U-S-I-A. It's fun to make up words, but it was also like, it's not semantics, it's not semiotics. We're not measuring meaning. We're measuring, and that's somehow it's depicted. We don't want people to think that we're measuring a central meaning. If you distill everything down. So we've tried this out with many different corporal, so things like the Sherlock Holmes novels and short stories, Jane Austen's works. So they're sort of famous authors, and then a huge collection of fiction from Google, which is a sort of complicated thing, but it's 120 years. So that's everything sort of smooshed together equally.

0:37:50.9 PD: Wikipedia, a snapshot of Wikipedia, which is, you would think, just a different object. Talk radio, so that's transcriptions of talk radio. So now we're going from spoken word that's being turned into text, but that's a different... It is spontaneous, it's different, it includes everything from NPR to sort of shock-jock stuff. So, it's a big grab bag. The New York Times as well, 20 years of the New York Times. So this is this type-token distinction again. So that sort of first work where we found this danger-safe axis and power-weak axis was looking at types. Every word got one vote until we kind of figured it out. And that's that's good, but it's just a substrate, and then you have to go and see what people actually... What in these different venues and beyond? How did they use these? How often do they use these?

0:38:42.0 SC: Now, I wanna do this with all of my podcast episodes. I have transcripts for all of them. I wanna know which ones are powerful or weak or dangerous or safe.

[laughter]

0:38:53.4 PD: Those ones I just listed, we didn't do them all at once. They were sort of, you'd kind of corral the data and then kind of do the same analysis again. I remember every time thinking, will this be different? What's going on? They're a little bit different, of course, but all of them have what I'll call as a safety bias that the predominance of words that people use are in this lower half of kind of this disc, if you like. And it's words that are trained towards being safe. At the bottom of things like comfort. If you'd want to go to safe week, you get the words like sofa and tortoise. And then safe and powerful words like wisdom and happiness. That quadrant, that's what I'll call the safe powerful quadrant, it sort of lines up with positivity and happiness.

0:39:49.1 PD: And there's this much older work that we built on when we looked at large text. Which came up with this idea of the Pollyanna principle. That in general interactions between people, and so this communication of all kinds, there are more positive aspects than negative ones. It's a bit surprising to people because it's easy to kind of bring to mind arguments or negativity on maybe online or the news is terrible, these sorts of things. But if you think society exists, it can not exist. But it does exist and it does hold together for lots of little sort of positive interactions. So this was work, this is maybe six or seven years ago. I didn't expect it. It was surprising, it sort of popped out that there are more positive words than negative words. And it's true across... We looked at 10 major languages, 24 corporal.

0:40:45.3 PD: Russian, German, Korean. So we looked at a lot of different pieces there and it really kept coming out. And yeah, there was this sort of a story there. That language is our great social technology. We're excited about Snapchat or something. But really language is this unbelievable thing that we have. Money is another one I suppose, perhaps. 'cause we've somehow encoded belief into this abstract thing it's pretty...

0:41:15.0 SC: Do I remember correctly though that in fictional stories, in particular, there's more danger than you might expect? Than you have in ordinary language, 'cause obviously a story wants to be exciting somehow.

0:41:27.8 PD: It's a good question if it's more... So all of them have on average of positivity bias. Now, there are parts where they dip below into this negative side of things. But if you look at music lyrics, one of the first things we looked at, and they kind of told us that we were getting somewhere. The rankings, so the bottom is heavy metal.

[laughter]

0:41:50.6 SC: The bottom of what? Of the graph?

0:41:57.1 PD: And you did ask about genres for fiction, but this is actually something where we did have genres. And this is on the happiness one. So this is yeah... And at the top is gospel and soul. So it kind of makes it... The ordering looked pretty good for this very rudimentary instrument we'd made. But even heavy metal, it was still above neutral on average. It's still...

0:42:19.7 SC: Yeah, good to know.

0:42:20.4 PD: Even though... So if you look at maybe Harry Potter or something, when things go bad, it does dip into this negative thing. Which is pretty hard because you've got to use a lot of negative words because on average, the bulk of words are over in the positive side of things. Or at least there's a skew towards positivity. So the generalization of that now is that in fact it's a safety bias. It's not really positive. It's that we're using more safer words.

0:42:49.6 SC: Yeah, okay.

0:42:49.7 PD: And dangerous words, they're incredibly important. They describe all of these things that can go wrong. We just don't use them as much. And when we use them, of course, they are incredibly meaningful. So happiness is basically, yes, is safety plus power.

0:43:08.2 SC: And what other things... The other thing that I thought was really fascinating was different stories that you looked at, you can associate characters in the narrative along this dangerous versus safe, and powerful versus weak axis. And I guess Harry Potter had all sorts of characters. There were dangerous ones, weak ones, etcetera. Whereas in Game of Thrones, almost everyone's powerful, and a lot of them are very dangerous. [chuckle] So it was one of a very a clash of extremes in that way.

0:43:39.1 PD: So that work again, it's completely thrilling to me. This is just incredibly exciting 'cause it comes from a completely different dataset. So this is sort of an online thing. Again, not something we did. But it went back to giving people characters from stories. And there are a lot of TV shows and movies. But there's also Pride and Prejudice. Some books are in there. And zooming out and presenting, It's about 100, it might be 200. But 150 of these semantic differentials. So some have gone way back in time in a way, and giving people... So it's for characters. So there's country, city, that kind of rich, poor. There are things that may be a little more clear as signs of people. So we were able to sort of start again with a really rich set of semantic differential. And I think there are about 800 characters that we looked at over 90 different, I'll call them story verses. There's Buffy the Vampire Slayer. X-Files, you said Game of Thrones. Arrested Development is in there. So it's really a big spread. So I think there's something for everyone. You might not know 80% of them. But there will be some that you could look at.

0:44:49.9 PD: So this is a completely different dataset. And doing this analysis again, and turning things around and kind of rotating spaces and not really doing anything funny. We're saying we're desperate to find this power danger thing. It really popped out for free. So this is something that's just very sort of supportive of what we've done in this other space, and there is a third dimension. I should mention that one, because in general, it's about what we call structure. So structured to unstructured. So a rock has a stronger structural level. Cardinal bureaucracy, Boss these are considered more structured. But clown, and chronical and tickle, these are words that got confetti, that are considered unstructured. And so for characters, it's playfulness. It's much more about playfulness. So if someone like Robin Hood has a playful measure on them. Or Mulder from The X-Files is playful. Scully is not playful. Not a lot of playfulness in Game of Thrones. [laughter] Pretty much all of them are in the dangerous, powerful quadrant. Which is the dominant. This is like dangerous winning, basically. Things can go wrong for you. Except for, I'm going to get his name wrong. Samuel.

0:46:13.2 SC: Tally.

0:46:13.6 PD: If you know the... Tally, yes. He's down in kind of the angel character. So Jane Bennet is there from Pride and... These are people who are... They're more towards a safe axis. They're still somewhat powerful, but they're more on the safe. So these are just really, really good people. That's who you find on the bottom. If you go around safe into the kind of weak quadrant, then you get people that tend to, they're not bad people, but that they tend to get run over.

0:46:47.0 SC: And they get...

0:46:47.6 PD: And out in the weak side, you get Michael Scott from The Office. Homer Simpson is out there. And then I wanted just to say that if you go further up, you get this is what Joffrey is from Game of Thrones. There aren't many from Game of Thrones in this what's the dangerous, weak quadrant.

0:47:03.2 SC: Yeah, okay.

0:47:03.9 PD: And that's the chaos agents. They're the chaos agents.

0:47:07.8 SC: And again, I guess this might be future research, but you have this time series of how the villains of the story itself evolves page by page. And now you're saying there's a different set of analysis with sort of the distribution of characters, or distribution of whatever events, and so forth. If you just gave me that, if you didn't tell me the plot, or the characters, or the setting, or whatever, how much could I learn? How much could I infer about the story just by thinking about both how it evolved over time and what kinds of characters are involved? Do we know that yet?

0:47:48.7 PD: We are really trying to do that. And I think it's remarkable, so I sort of think of character is the shortcut to story. What do we do with stories? A lot of them are about prediction. They're about telling us how the world works, proverbs do this, stories that we listen to. These are ways that your life can go or maybe other people's lives can go. We are trying to make sense of the world. We tend to have stories wrapped around individuals, which I think is interesting because we wanna be in them. It's hard for us to tell stories about systems. And that's why, yeah, when it comes to complex systems and all these sort of formula that scientists work on, it's really hard 'cause people wanna anthropomorphize everything. They absolutely do. And I understand that drive, but it's hard. It's hard for us to tell the stories. But I think one of the things... Stories are incredibly important, that's what I'm trying to say there. But we also can short-cut them by just saying, "Oh, here are what these characters like. Here are archetypes."

0:48:53.7 PD: And we sort of know what will happen if you say, "Here are these three people and here are their... " We can kind of try to predict what will happen. So I think they're like little kind of wind-up toys. In our brains, we will try to simulate or run the dynamical system of these characters interacting. It's very natural. We wanna do it. We wanna predict to a fault, obviously. What we're trying to do now with this is, this is tough, but you wanna get this sort of danger power profile around a character and how it might evolve through a story as well. There's the temporal network of which characters are interacting with each other. We should be able to get that out and with the environments. And there, you could imagine doing this for Star Wars, or Lord of the Rings, or something like that, or Pride and Prejudice, any of these pieces. Can we kind of trace that through? And it might be pretty rough. We divide books into thirds or something, but then we could do a 100,000 stories and get out one of the big patterns. Breaking-in and seeing this kind of this two-dimensional space has been very helpful in a lot of ways. Something I think it's really what it is. Another space we've looked at, perhaps just to start with, is Twitter.

0:50:16.8 SC: Yeah.

0:50:16.8 PD: Because we're doing that a lot, but looking at at least what was expressed on Twitter for the January 6th, the attack on the Capital. What you see there, just taking all the tweets and then scoring them, is sort of these measures of energy, high energy that I'd sort of mentioned before, and happiness kind of goes down. But really what you see on this kind of compass of essential meaning is that it really points straight to danger. It actually goes straight to danger. Sorry, upwards danger. Danger is kind of high energy plus badness. In a way, it's sort of a to teach to use these kind of all other frameworks. So in that respect, you'd see it goes down as being a sad thing on our hedonometer, but that's just a projection onto that axis. It's a shadow of the real direction, which was pure danger.

0:51:18.8 SC: That's very interesting especially 'cause one of the questions I was gonna ask was, if you're looking at happiness versus sadness on Twitter, that's obviously a very interesting thing. But when I actually looked at the data, everyone's happy on holidays. That's a clear winner, Christmas, or at least you put out your happy tweets on Christmas. And then, everyone's sad when there's a terrorist attack or a shooting, okay. But other events, like a presidential election, are more of a mixed bag. And I'm wondering if there are, the simplest possible thing I can think of is just a measure of the variants. Is it something where a whole bunch of people are happy and a whole bunch of people are sad at an election result, or is that something that you've quantified?

0:52:00.1 PD: Yeah, we have it. We just haven't put it on the site. You're exactly right. So to what degree are people in unison about something? And for the extreme things, just in some ways they have to be right just for those scores to be so high and so low. But you're kind of right. There is a predictability to the big spikes and positivity, and there are just annual holidays. People are using the expressions of that time, even Happy Valentine's Day. Now, if you look at the words being used and compare them maybe to some other dates, you can see that there's really some negativity in there as well. I think it's swamped by this kind of positive. So Valentine's Day will have lonely, but it's being kind of like that. And Christmas might have that as well, or... So we do wanna be a little careful. It's not like everyone is doing that, they're right there. And you can see for days of the week, so Saturday is generally the most positive day. Tuesday is generally the most negative day. But Saturday has movies, weddings, there's lots of positive things that might happen on Saturday, but also has bored and hang over. It's not all great for everyone on Saturday.

0:53:15.6 SC: And there's a daily rhythm too, right?

0:53:18.5 PD: Yeah, there's a strong daily rhythm, which [chuckle] I think it's actually in science mixed in. I have this line which is, it's the daily unravelling of the human mind. I know sleep remains a mystery, but I think we need to be rebooted because I think we just become emotionally unstable by the end of the day.

[laughter]

0:53:38.6 PD: I'm being funny. But you see swearing goes up through the day, cursing goes up through the day, and things. And then like you sort of said, the variance goes up as well, the emotional variance goes up through the day. So people start off fairly tight, things are okay, but it's not emotionally varied as well. Yeah. And then the wheels kind of come off collectively.

0:54:00.9 SC: If one is being a little bit skeptical here, is it possible that... I might think a lot of people are happy at 7:00 PM 'cause they're enjoying dinner or a movie or whatever, but those people are not on Twitter. [chuckle] How much of a bias do we have by the fact that Twitter is our data stream here?

0:54:19.6 PD: Yeah. No, it's a weird selection. But I will say that more generally, if you zoom out, it does match up with Gallup polls.

0:54:26.7 SC: Okay.

0:54:27.6 PD: Which is kind of wild. We have some other instruments. There's one we call the lexicocalorimeter, which takes in phrases from Twitter and assigns them as to whether they're kind of food stuffs or about exercise, and then assigns calories to them. And so it's at the state level for the US, but the rankings you get out of that, 'cause you sort of get these calories in, calories out, which we're not sort of... It's a very rough silly thing.

0:54:57.1 SC: True.

0:54:57.7 PD: But it matches, it lines up with obesity rates.

[laughter]

0:55:03.5 SC: So you can tell which states have higher obesity rates from Twitter, is what you're saying?

0:55:07.7 PD: Yeah.

[laughter]

0:55:08.5 PD: And you can look at what they're talking about. So you have Colorado does come out number one, for month... At least in this time we looked, it was sort of three. But was overly fond of talking about bacon, which sort of pushed it at the top.

[laughter]

0:55:21.7 SC: I would have thought Coloradans are pretty healthy outdoorsy people. I don't know.

0:55:26.1 PD: Yeah. So there's lots of skiing and running and biking.

0:55:27.9 SC: Right, but a lot of bacon.

0:55:28.2 PD: Yeah, those words are there.

0:55:29.1 SC: A lot of bacon and doughnuts, yeah.

0:55:32.0 PD: Well, that whole thing is quite amazing to look at, because the ground state though, if you like, for what's being expressed in terms of food and exercise is pizza and watching television, because we have a lot of activities and some of them include lying down as an activity.

0:55:46.6 SC: Okay. [chuckle]

0:55:48.0 PD: Watching television is one. So the states differ from that, but their baseline is pretty uniform in terms of what is being expressed. And of course that's advertising, that's all sorts of things, it's a bit of a melange of inputs.

0:56:03.1 SC: And this brings back something that you mentioned right at the beginning, and I thought was actually... Maybe I had not thought this way before, but I really certainly should have, which is that we very often talk about the traveling of ideas and sharing and contagion of ideas or notions or opinions through social networks and other networks. An idea might be universal healthcare or the right to bear arms, but stories can also travel through these information networks and narratives. And is that either something you've done or is it a target to sort of tease out which stories, which narratives are being shared and how useful they are? 'cause I'm certainly willing to believe that a good, compelling narrative wins every day over a set of facts no matter how true they are.

0:56:56.1 PD: Yeah, I have this thing where I say something like, "Never bring statistics to a story fight."

[laughter]

0:57:01.1 PD: It's not gonna work out for you. You should bring the numbers, but you've gotta bring stories as well.

0:57:06.1 SC: Yeah.

0:57:06.5 PD: It's just how we kind of operate. And of course, people in politics, people in religion understand this. They've been figuring out how to tell stories about things for a long time. So it's absolutely a long-term ambition to do that. It's very hard. We have this sort of framing of story angle, how do you get out the stories that people are expressing around an event as it happens and then maybe long-term? So say the Parkland shooting. This is a terrible event, just to pull out of the many. How do you sort of track the stories that emanate from that? And by that time in history, I was pretty sure that there would be a lot of conspiracy theory type things. And sure enough, I remember going on YouTube the next day and just searching for Parkland, and 18 of the top 20 hits which was sort of presented as 20 were conspiracy theory things about that that was all faked, it's false flag, and so on.

0:58:08.0 PD: So how do you measure that in real time? This is an enormous goal. It's very hard. So maybe there's a blossoming of stories after some event, because there's just confusion. Which ones are then fighting against each other, which ones start to win? I have notions of stories kind of having the hierarchies to them. You wanna be able to tell your story simply, and that's why slogans have this great effect. And they might not be tethered to some bigger story. Certainly, religions work in that way, you wanna be able to sort of say things quickly. It's this hierarchy of narratives that you wanna be able to deliver. You know what? It's an incredibly difficult problem but I think that framing is... Well, I won't say the right one, but I think it's a very powerful one, to be thinking about, what are the stories people are telling and how much are they reducing stories to sort of characterization.

0:59:08.8 PD: So, for example Pizzagate. That story is pretty out there, it's pretty out there, that there's a basement in this Comet Ping Pong place and there are terrible things happening to children and there's a cabal and all this sort of stuff. It's a little hard to grasp but I think the access in there is really through character. And so Hillary Clinton, for example, being characterised as this evil person, as to use folklore, kinda thinks is a witch. Then you say this story about her and you're like sure, because she's the devil or if someone else is framed as a godlike character, they can't do no wrong and you say some story about them that suggests they've done a bad thing, it's deflected, it's washed away. What are the defense mechanisms built into stories? It's a really big part. How do stories become hermetically sealed or story versus if you like.

1:00:02.8 SC: Yeah, I know that politicians are very focused on the idea that they wanna paint their opponents and themselves in certain ways. There's certain kinds of criticisms you can make that just don't stick to certain people because they don't fit the narrative in exactly that way and finding exactly that kinda weak point, how do you paint someone as a bad character in a way that is consistent with what people already think about them is one of the secrets to political success.

1:00:31.9 PD: One thing I've thought about with this danger and power kinda framework is flipping between saying your opponent is dangerous and weak.

[laughter]

1:00:42.9 SC: Yes.

1:00:43.9 PD: And it doesn't seem to matter, we sort of know that in politics, you can kinda say lots of things. And if it doesn't stick, you just keep moving on. But they're really trying, in our framework, orthogonal attacks, sort of literally orthogonal attacks. So you're trying to say this person is in a sense quite powerful and dangerous or maybe the next day you wanna say, they're fickle and weak and that's sort of kind of a funny attack. So really, incredibly hard problems. And look, there's a huge danger to this as well, of course, being able to manipulate stories and kind of measure them and do all these sorts of things and see what the weak points are in a system, disinformation, people work on this all the time. Something I keep reflecting on is, for scientist and journalists, so my wife's a journalist and I always sort of think of journalists as scientists with a deadline.

[chuckle]

1:01:46.1 PD: We're trying to figure things out and tell the truth about something and kind of explain things broadly as a big piece. That's a huge battle for us because the possible stories you can come up with that are not true but are favourable to a viewpoint or a culture or whatever are infinite. There's just an incredible number of things that maybe what's adjacent to the true story, you can really explore some stuff and find the stories that will spread faster, that will tack onto people's existing beliefs. So that just is gonna be a challenge, it's always been a challenge, in sort of a time of so much information, so much availability, so much ability to curate and kind of create story verses that are misleading online, that you can be taken into, we have to work really hard on this, this is hard.

1:02:41.4 SC: So I went to the web page for the hedonometer, I encourage everyone to check it out. And it's searchable, you can look for all sorts of wonderful things. And so just to normalise my own expectations, I searched for the frequency of the word quantum, 'cause it's something I'm interested in and doesn't have a lot of high rates of appearance in news stories but occasionally, and what you find is probably what I should have expected ahead of time, which is that there's a sort of baseline which is pretty normal and there are spikes, and the spikes are pretty extremely noticeable. But I don't know what the spikes are from. Clearly, there was from a story about a quantum computer or something like that. Maybe my book came out, I don't know, but that'll be great. [chuckle] So how much of that sort of back engineering, reverse engineering can you do when you see something weird happening in the data? Oftentimes, in the big stories you just know what it is, but are there objective procedures for figuring out why the words are shifting in these different ways on different days?

1:03:44.5 PD: Yeah, so a number of pieces here, this has been an eternally interesting difficult problem. What happened? What happened? And I remember early on, maybe 15, even 20 years ago, looking through Google, trying to find out what happened on a particular day, it was kind of hard. And then Wikipedia emerges, and if you would, of course has entries for every date of history sort of I suppose now. But certainly, in the modern times. And there's a sort of weird list, it's just a weird list of things that happened in the world. There was a Star Trek convention, there was this war started. It's a real mixture, it was a cocktail. So with some of what we've got, the story wrangling for example, if you click on a point, it will take you to Twitter and search Twitter for that date and it will sort of show you which tweets are being amplified on that day potentially.

1:04:37.7 SC: Oh, okay.

1:04:39.7 PD: So tweets get deleted, it's a bit of a problem. So maybe it doesn't hold up, but that's something where we... It depends on the sort of restrictions you have. We have 10% of old tweets going back to 2008, but we can't share them and put them out wholesale. So we've tried to do something here by pushing it back into the actual structure itself of Twitter. Google Books, for example, you can't really... It's harder to do that, it's harder to go back and search Google Trends. You kind of wanna figure out why is this thing being talked about? So at least with Twitter, we have that to some extent. We have another big body of work which really is connected to this trying to figure out exactly this, what happened on a particular day or a particular week? And we did it around Trump. It's the president, it matters, it's a very good test case. And I think it certainly, 2015 and 2020 and kind of still now, really, what I call a turbulent time story, story turbulence has been really high. Excuse me, the turnover of the stories has just been kind of incredible. And I remember sort of thinking in 2016 and '17, especially 2017, can you remember what happened in the last two weeks?

[laughter]

1:06:04.3 PD: It was a challenge. There could be massive events like Space Force or something. Or you could just...

1:06:11.3 SC: There's always something.

1:06:13.2 PD: Yeah, there's always something. Look, the world's a rich place, but it was an effort to sort of study that. We have this thing, which is kind of computational timeline reconstruction. And it works through Twitter, but it could work through anything. We could do it through, say, a State has archives, for example, going back in time or any kind of news source, maybe the New York Times. What are the sort of narratively dominant terms, words and pairs of words that kind of pop up? And these kind of act in keywords into a bigger stories. They're not telling you buy in Greenland. It doesn't tell you the whole story of Greenland. This referencing Trump, but it would sort of point it out. What we were able to see there early on was that there's a lot of turbulence in that first year, 2017. There was just a lot of change over. There were also natural disasters. Hurricane Maria and so on. So these things came on. But there was North Korea sort of provocations and then Charlottesville happened the next week. It's hard to remember the orderings of these things.

1:07:25.0 PD: So we have a timeline that kind of comes out computationally. And what you see in 2020 is just this really sudden change into Coronavirus. As we called COVID Coronavirus initially for many months, just being the dominant story every day for a months. And we have a measure we call it choronopathy, you can see that time functionally slowed down because there was not so much turnaround for in stories, it was always the same dominant story. George Floyd's murder explodes the narrative. And then that becomes stuck again too because that becomes a durable story. And then of course, we get to the election and things.

1:08:03.7 SC: You can quantify the impression we all have that sort of time froze once the pandemic hit?

1:08:09.7 PD: Right, and people said this though. People say this anecdotally all the time, yesterday felt like a week. I saw one tweet, it was like, "I'm gonna write a autobiography of the last 10 years in my life, it's called 2020."

[laughter]

1:08:25.1 PD: It's a very physics-ish sort of thing, time dilation, and so on. This was a memory at the population scale of, did things seemed to really... Maybe it wouldn't have, maybe it is just sort of turnover, it doesn't really matter. But in fact, you have 14 days in April 2020, you would have the same sort of turnover in two days in 2017, it really had slowed.

1:08:47.9 SC: It's a weird thing, I know I had David Eagleman on the podcast a while ago. But there's this weird mismatch between simultaneously one says, "Nothing is happening, and time seems to last forever." [chuckle] The rate at which time passes is sort of inverse to the rapidity with which things happen. An exciting movie seems to go buy very quickly.

1:09:09.7 PD: Yeah, no, that can be... And it depends how much you're recording in your own mind.

1:09:14.3 SC: Yeah, very much so.

1:09:14.5 PD: And there are studies of how people... Yeah, when something, it's usually around something sort of dire or terrible happening, an accident, you have this seeming slow motion replay over in your head. And it's because you were really kind of writing down the memory. That's what I understand. You are really recording it and kind of you have it in fine detail. But there's a lot that goes on in life, and we know we miss most of it. With the sounds and the things, they sort of pour past us. Is it too much to measure for one person? So our brains are pretty well, problematically good perhaps at just ignoring things that don't fit our little narrative right now.

1:09:52.5 SC: There's certainly a lot that you've covered that is going on here. It's wonderful to have this conversation 'cause I get the impression that a lot of the excitement in what you're doing is still ahead. We just started picking some of the low hanging fruit. But I guess one final question, which you did allude to earlier. We can take these ideas and turn them around and put them to work. Maybe either in artificial intelligence or in political campaigns, or in writing a screenplay. Can we figure out, can we distil what would be the perfect narrative or the perfect time structure of valence or something like that? Are people trying to operationalize these ideas in that sense?

1:10:35.9 PD: Yeah. There's a lot of work over the years, and you can maybe make a fair amount of money of saying you can predict which things will take off because of your analytic tools. So I think what we can do is say, look, here's the shape of your story, here are the kind of Trumps you've used and so on, and this is how it compares to others around. And maybe give people sort of a diagnostic like that. Now, in terms of making something take off for sure. Well, this is the problem, reality is socially constructed, and we have all the work on fame. And of course, many people have kind of come to this in different ways. They show if you have kind of basically around the world over and over for cultural social things, it doesn't always... There's a lot of variability. Harry Potter doesn't win in every universe. It certainly didn't win for the first 12 or 13 editors who said no.

[laughter]

1:11:32.8 PD: How could they not know? They're professionals. How could they not know that this would be this giant thing? And that actually indicates how much fortune and luck, the fact that there's an enormous runaway success in something. The world is full of these, where the number one thing is so much bigger than the second one, and the third one, these heavy tailed distributions. That it's indicative of actually... What will tell is... Our simple story for that is the MonaLisa is fantastic for example. It is intrinsically amazing. If you look at it, you will be transported and it's because of this and this and this.

1:12:14.9 PD: But we just leave out completely the social construction aspect, it's because it took 400 years to get to that idea that it was the greatest painting in the world. And there's a whole set of reasons for why it became increasingly famous that are not intrinsic, the stories around it. But it's a good example of something where you can't really kind of... I guess that's the point of view. You try to make things as good as you can, and you want to make them spreadable. That's important, that people wanna tell other people about it. I think that's the great thing. And of course, that works for disinformation as well. So what will spread in the social wild? This is the great problem of advertising. This is sort of probably made up line, but half the money is wasted in advertising. We just don't know which half.

[laughter]

1:13:02.4 PD: And it's sort of true. Very unexpected things happen and take off. And how did you not know that before? Well, there's a lot of social construction that goes on. So it wouldn't be anything that would guarantee the future of some social phenomena. But it would serve as a, I think, can serve as a diagnostic. I worry about the negative aspects. But I think we have all of science here. We have to know the things so we can start to build defense systems. And I think AI, for example, or what we'll call AI or sort of certainly the one worked with language and all of these kind of crazy issuance, they've gotten way ahead of us. We're trying to make decisions about in juries or parole or something like that, or presenting things that turn out to be deeply racist or whatever. We've got ourselves way beyond describe and explain and into the sort of create category of science. I think it's turning around. People are looking at the copper and so on that they've built some of these systems out of. And I'm really relieved to see that happen 'cause I think there was a wild time there and we got ourselves... Facebook's over, which things spread.

1:14:34.3 PD: You have dials that you turn to make certain things spread or not spread. You can change the social contagion there, and that's... Yes, there's money on one side, but there's also just society held together on another side. And I think that's important.

1:14:50.7 SC: I guess there's also a feedback question. There's this David Lodge novel I read from the 80s, and he mentions very, very early efforts in Digital Humanities where you would digitize someone's book and figure out what words they used more often than the typical English language. And this author was shown that you used the word moist or whatever, way more than average. And once he found out those words, he couldn't write anymore. 'cause he was too self-conscious about doing it, and I wonder if we figure out too much about what the shapes of these stories are and everything, how that's gonna affect how we tell them ourselves.

1:15:33.6 PD: Yeah, there's some peril there, I suppose. Yeah, scientists, classic science move, just looked too deep. It's like trying to understand comedy and destroy everything.

1:15:41.9 SC: Yeah. Explaining the joke. [laughter]

1:15:44.9 PD: Thanks, scientists. No I think, I would hope it would just get... People are incredibly creative, they find new ways to tell stories. We're in a time where we have so many stories in the past that we kind of play with them and so on. I don't think that we'll stop all of that. It can produce some stuff that's not very good. That maybe is the problem when you try to build formulas too much and so on. So that could be a slightly more dangerous.

1:16:18.4 SC: Fair enough. Well, alright, I will just repeat. Thanks, scientists. I like that as a motto. And Peter Dodds, thanks very much for being on the Mindscape Podcast.

1:16:25.6 PD: Sure, it's been a great pleasure. Thank you.

[music]

1 thought on “181 | Peter Dodds on Quantifying the Shape of Stories”

  1. A person’s body language and facial expressions often give clues about how they actually feel, for example are they confident; enthusiastic; truthful etc.?

    Another way is to analyze the words they use and the stories they tell. This is especially useful when we can’t see them or when we only have access to their written statements. For example, a politician might give a speech about a particular issue to let others know where he stands but, of course, not wanting to alienate too many voters. So, he’s likely to use words that are somewhat ambiguous in meaning when addressing a general audience. At the same time when he’s talking to a group of diehard supporters his(her) choice of words and the stories he(she) tells are much more colorful, and much less restrictive.

    This is just another example of the structure of stories and use of words employed by Peter Dodds to gain insight into how people communicate and how to reach them.

Comments are closed.

Scroll to Top