What Good is a Theory?

Over at Edge, they’ve posted a provocative article by Chris Anderson, editor of Wired magazine: “The End of Theory — Will the Data Deluge Makes the Scientific Method Obsolete?” We are certainly entering an age where experiments create giant datasets, often so unwieldy that we literally can’t keep it all — as David Harris notes, the LHC will be storing about 15 petabytes of data per year, which sounds like a lot, until you realize that it will be creating data at a rate of 10 petabytes per second. Clearly, new strategies are called for; in particle physics, the focus is on the “trigger” that makes quick decisions about which events to keep and which to toss away, while in astronomy or biology the focus is more on sifting through the data to find unanticipated connections. Unfortunately, Anderson takes things a bit too far, arguing that the old-fashioned scientific practice of inventing simple hypotheses and then testing them has become obsolete, and will be superseded by ever-more-sophisticated versions of data mining. I think he misses a very big point. (Gordon Watts says the same thing … as do many other people, now that I bother to look.)

Early in the 17th century, Johannes Kepler proposed his Three Laws of Planetary Motion: planets move in ellipses, they sweep out equal areas in equal times, and their periods are proportional to the three-halves power of the semi-major axis of the ellipse. This was a major advance in the astronomical state of the art, uncovering a set of simple relations in the voluminous data on planetary motions that had been collected by his mentor Tycho Brahe.

Later in that same century, Sir Isaac Newton proposed his theory of mechanics, including both his Laws of Motion and the Law of Universal Gravitation (the force due to gravity falls as the inverse square of the distance). Within Newton’s system, one could derive Kepler’s laws – rather than simply positing them – and much more besides. This was generally considered to be a significant step forward. Not only did we have rules of much wider-ranging applicability than Kepler’s original relations, but we could sensibly claim to understand what was going on. Understanding is a good thing, and in some sense is the primary goal of science.

Chris Anderson seems to want to undo that. He starts with a truly important and exciting development – giant new petascale datasets that resist ordinary modes of analysis, but which we can use to uncover heretofore unexpected patterns lurking within torrents of information – and draws a dramatically unsupported conclusion – that the age of theory is over. He imagines a world in which scientists sift through giant piles of numbers, looking for cool things, and don’t bother trying to understand what it all means in terms of simple underlying principles.

There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show.

Well, we can do that. But, as Richard Nixon liked to say, it would be wrong. Sometimes it will be hard, or impossible, to discover simple models explaining huge collections of messy data taken from noisy, nonlinear phenomena. But it doesn’t mean we shouldn’t try. Hypotheses aren’t simply useful tools in some potentially outmoded vision of science; they are the whole point. Theory is understanding, and understanding our world is what science is all about.

29 Comments

29 thoughts on “What Good is a Theory?”

  1. The GR universe is everywhere…nothing is outside it…except maybe Mr. Anderson! In fairness to Mr. Anderson, I’m working from the contents of Sean’s post, but hypothesizing, developing procedures, observing, relating and drawing conclusions actually defines existence in a universe whose structure depends on the existence of both consciousness and particulate information.

  2. Data is not information.
    Information is not knowledge.
    Knowledge is not understanding.
    (apologies to FZ)

    Sometimes smart people get so caught up in their own ideas they manage to convince themselves that patently ludicrous things are true. With teh intertubes, now we can all bear witness to their mental pratfalls. This is one of the more egregious cases so far this year and an early nominee for the 2008 Ray Kurzweil Prize For Conceptual Bloat.

    I think perhaps the nicest and least insulting analysis of this monumental conceptual blunder is that it’s simply an unfortunate overextension into the scientific realm of the postmodernist idea that meaning is simply a construct. Which is a fine and valid idea in its own right but has led more than one bright thinker into making farcical assertions such as this one when taken too far.

  3. Yeah, I agree, it’s a pretty confused article. And then there’s that weird passage about “theoretical speculation” in physics which seems to, um, completely undermine his basic argument:

    But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

    Wha? So if the problem with physics is that we don’t have enough data to falsify our theories, how is that an example of “massive” amounts of data overturning the old approach to science? Seems like he tossed that paragraph in to prove he’s heard of string theory.

  4. Cooper’s title and some of his statements say that we can abandon the scientific method. When you read through the article looking for a specific statement about what will replace the scientific method, the closest you can find is

    We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

    Finding patterns is not the scientific method. It is only step one of the scientific method. It is what we called “observation” when we learned about it in the fifth grade. Before observation becomes science, you have to say something smart about the observation, and test your smart statement. Then, you are getting close to science.

    Cooper describes what I assume is his new version of science by citing the work of J. Craig Venter who appears to use “high-speed sequencers and supercomputers that statistically analyze the data they produce” to discover “thousands of previously unknown species of bacteria and other life-forms”.

    Venter can tell you almost nothing about the species he found. He doesn’t know what they look like, how they live, or much of anything else about their morphology. He doesn’t even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

    I don’t know the work of Venter, but before he can “discover a species”, someone needs to be able to verify that his hypothesis is working – preferably by direct observation of the species in question. In this example, the statistical analysis is not replacing theory, it is simply a different way to observe interesting things.

    After reading the article, I have to question Cooper’s grasp of the meaning of the word “theory” and his understanding of the scientific method. He isn’t proposing a revolution. From what I understand, he is simply pointing out the better tools that science is currently using.

  5. Abelardo Duarte

    I believe that scientists will still be doing theories out of the interpretations of the data. my experience as a researcher is that data mining gives results as confusing as the data itself.

  6. Pingback: tongue but no door (dot) net » Blog Archive » Sean Carroll Piles On Wired, Totally Pwnz

  7. Data mining, as a previous poster said, merely represents another experimental tool. Statistical spikes are not explanations; they are observations. Noone is suggesting that we do away with the scientific method, and if there are people who think that “we can get rid of science because now we’ll just ask Deep Thought what he makes of it”, they don’t understand what science is as well as the late, beloved Douglas Adams did when he punningly created that computer.
    It is comforting that Sean, an exceptionally brilliant cosmologist, correctly identifies the nature of the scientific method as it exists in the “petabyte” era. It is disconcerting that at times we all seem to forget this. Without resorting to “Global Warming is a Hoax” accusations, I will say that I am surprised to hear so many scientists commit the very sin of “correlation is adequate” when it comes to that matter. Crucial to that matter, it seems, is an ability to separate the scientific method from politics.
    I apologize if making a personal attack: I merely single out Sean due to his omnipresence, which is a testament to his own capacity for work, and unbridled interest in searching for truth. Nonetheless, it seems that the relaxation of scientific rigor would in fact be an alarming trend if it were applied to socially relevant disciplines such as climate science and economic theory, just as much as it would in the more austere particle physics and cosmology.

  8. This is one of the stupider cover stories Anderson has done for Wired. It fits his formula. His thesis is designed to be “provacative”, being correct is optional.

  9. I agree that Anderson’s idea ultimately fails – and I like Tyler’s comment that this should win the Conceptual Bloat prize. In my response to Anderson’s article, I echoed what a number of your commenters wrote, that Anderson essentially elevates the role of data to a level that isn’t warranted. You don’t throw out the very methods that let you go beyond correlation simply because you have petabytes of data to sift through.

    Indeed – Anderson points out that data without a [conceptual] model is just noise, but then he attempts to claim that massive amounts of data somehow transcends its own noisiness. That’s a bit of a stretch.

  10. Cosma Shalizi and Fernando Pereira, both of whom teach courses and do research on this sort of analysis, are not exactly impressed with Anderson’s arguments.

    Quoth Pereira:

    Where does Anderson think those statistical algorithms come from? Without constraints in the underlying statistical models, those “patterns” would be mere coincidences. Those computational biology methods Anderson gushes over all depend on statistical models of the genome and of evolutionary relationships. … without well-chosen constraints — from scientific theories — all that number crunching will just memorize the experimental data.

  11. I sometimes wonder if data overload won’t be a benefit. Most of the data isn’t that crucial.

    Having said that I remember back in the 90’s when everyone was worried about losing data on old formats (paper tape, old styled magnetic media etc.) I believe they tried to update a lot of it onto backed up hard drives but a lot of astronomical data was lost.

    That was very sad since we lost the ability to do studies over time by looking at various datasets with time.

    However there is a lot of stuff that has been kept.

    The benefit of losing data though is that it forces people to adopt a skepticism and reconduct tests. That’s not necessarily a bad thing. Often in conducting a test you can find new errors or learn new skills. It gives grad students something to do too.

    But let’s be honest, how much of the data from most studies from say the 1860’s through the 1980’s was kept? It’s not like this is a new problem.

    The problem is that now we conduct tests with so much resolution of data – much of it redundant – and we don’t know what to do with it all. Surely a lot of the data could be thrown out.

    I can’t understand though claiming this would affect the scientific method. If anything it’d accelerate it since theories are easier to store than tons of raw data.

  12. Ugly Pretentious Mind

    The real danger of too much data is that people have more information about the limits of a person’s ability to think rationally

  13. I get the distinct impression, which could certainly be wrong, that the biologists are getting a bit disillusioned with brute force methods. It’s not that they think that reading entire genomes is valueless, but they had expected some brighter light to dawn as a result of all the mountains of data.

  14. Is he saying the narrative should be removed? But that is exactly what the scientific method tries to do. By creating theory first and then collecting data you are removing the narrative that humans are so good at creating when they collect data first and then fit a theory. We can make up a story to explain any data set or correlation after the fact. I do find the idea of looking at data for patterns without preconceived ideas interesting, but I’m sure that along with some potentially meaningful correlations you will also find…wait, is that a face?

  15. If you are data-mining you should remove the narrative (for reasons stated above) so he has that exactly right.

  16. Data is bottom up. Theory is top down. They do have to meet somewhere. If the theory can expand to accommodate the data, it’s good. If it can’t, it needs replacing.

  17. You need theory to decide what data to collect, and why. A parallel exists in medicine, where the uncritical worship of voluminous data and cool data visualization makes a lot of money for manufacturers of CT, MRI, and PET machines. With CT, the tradeoffs involve radiation exposure risk as well as money.

  18. Lawrence B. Crowell

    This criticism can be levied against other sciences as well. Molecular biology uncovers very complex enzymatic pathways and webs of pathways through a set of experiments which require massive data analysis. The genetic map of an organism is a daunting task.

    Anderson appears unaware that data analysis has emerged as a sub-science of its own with lines of research in applied mathematics. A large set of data will betray signals with digital filters and statistical methods. If a signal is there it can be found in principle by some method. Hmmm… maybe letting the SETI people sift NRAO data and work up new methods has wider applications than an improbable search for little green men.

    Since Berlinsky wrote his languid missives and councils of despair with the whole ID flapdoodle I have little time for these defeatist types.

    Lawrence B. Crowell

  19. I do have to express a certain sympathy for one of the points that lies somewhere behind this argument. Although theories are a vital component of science, the level at which those theories are expressed depends on the complexity of the data. For example, presumably human psychology could ultimately be explained in terms of quantum mechanics, but it doesn’t strike me as a particularly worthwhile or meaningful exercise to attempt, as it would teach you essentially nothing about the nature of human psychology. It seems at least a priori plausible that some big experiments have now reached a scale and complexity where explanations in terms of fundamental theories of physics no longer provide the most insightul explanations.

  20. Mike M,

    That’s not the point. You don’t have to explain things in terms of the most fundamental laws we know to at least have some explanation. For example, eventually we may hope to understand the operation of the human brain in terms of the activity of its basic components: neurons. There are many neuroscientists today working on this very problem, and it’s a good goal to have.

    We can then understand the operation of individual neurons based on the chemistry of their components. We can understand the chemistry of their components in terms of non-relativistic quantum mechanics. We can understand non-relativistic quantum mechanics in terms of quantum field theory. And, some day, we may understand quantum field theory in terms of some more fundamental theory.

    Either way, you don’t have to describe everything by breaking it down to its most fundamental components. There are ways of simplifying the problem by breaking it up into pieces.

  21. But even if you optimistically believe that all the links in the chain can be made (for which there is no guarantee), it still wouldn’t make sense to discuss a complex system like human psychology in terms of quantum field theory, as it would yield zero insight. Similarly, I could well imagine that there will come a point where it no longer makes sense to discuss a huge data set from a complex physical system, like the GAIA observations of the Milky Way for example, in terms of fundamental physics.

  22. Mike,

    So what you’re saying is that theory is necessarily based on objectives, than objectivity?
    I suppose then that those who do base their studies on pure objectivity are taking a long walk off a short board.

  23. Not sure I am saying anything so radical, John. I would imagine that psychologists would argue that their theories of human behaviour are objective (though some would argue!). The fact that the theory is driven utlimately by quantum field theory if you were able to tunnel down far enough is true, but essentially uninteresting as it provides no insight into the subject. Similarly, a football game is ultimately driven by fundamental physics, but quantum theory rarely features in the post-game analysis. Since these levels of description are surely somehow related to the complexity of the system under consideration, all I am speculating is that if you have a complex enough experiment then it may no longer be appropriate to analyze it in terms of physics even if it was designed as a physics experiment.

  24. Mike,

    Since an objective perspective is an oxymoron, any inquiry is inherently subjective to begin with. Added to that, the process of inducing a general theory from a collection of data is inherently reductionistic. So it would seem problematic to assume that one might draw particularly objective conclusions from a particular study. This is mitigated by multiplying the fields of inquiry and finding which patterns are most prevalent.
    From my limited experience and knowledge, it would seem one of the most basic patterns is of opposing forces in motion relative to one another, be it electricity or football, politics or geophysics. So is nature some dualistic construct, or is there a fundamental physicality and if we just keep smacking particles into one another, we will eventually find it. Or are we missing some larger point?

  25. Equator of principle and particle (gluon of self-contradiction), is the Absolute Logic of self-creation of Spacetime-Continuum.

Comments are closed.

Scroll to Top