30 June 2008

Wired drinks the Science 2.0 kool-aid


Just a few months after computer scientist Ben Shneiderman heralded a new kind of science based on, well, nothing particularly new, Wired magazine hops on the bandwagon with a cover article announcing the end of science. This is an historic event, indeed - but only in the Henry Ford sense. Which is to say, mostly bunk.

Basically, Wired's argument, as laid out in the introductory article by Chris Anderson, is similar to Shneiderman's - that a deluge of newly-available data (the "petabyte age") will somehow make the longstanding scientific method of observation-hypothesis-experiment obsolete:
This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. ... With enough data, the numbers speak for themselves.
Doesn't anyone read Popper any more? Just walking through some of the case studies listed by Wired makes it clear that, although the petabyte age will allow us to ask and answer new questions about life, the universe, and everything, we'll still have to use good old-fashioned hypothesis testing to do it.

Predicting agricultural yields. Agricultural consultants Lansworth predict crop yields better than the USDA by using new data on weather and soil conditions. They crunch a lot of numbers - but they're still testing hypotheses, by fitting predictive models to all that data and determining which ones explain more of the variation. Or so I surmise, since we're never actually told anything about their methods.

Micro-targeting political markets. Since 2004, the amount of data collected on who votes for whom has ballooned. And, from what I understand of the description here, political consultants are having a field day looking for trends in the data - which is to say, mining the data to develop and test hypotheses about voter behavior.

Guessing when air fares will rise. The website Farecast looks for trends in flight data and ticket prices to predict whether fares will change in the near future. This is - you guessed it - just another way to say that they're testing hypotheses.

It's true that most of the examples Wired cites don't require active formulation of hypotheses by the people overseeing analysis of big data sets; instead, they let computers find the best-supported hypothesis based on the available data. And that is new - kinda.

Biologists use a similar approach for reconstructing evolutionary trees and solving other computationally challenging problems, called Markov chain Monte Carlo, or MCMC. In MCMC, you feed the computer a dataset, and tell it how to judge if one possible explanatory model (or hypothesis) is better than another. Then you sit back and let the computer explore a whole bunch of slightly different models, and see which ones score better. Far from making hypothesis testing obsolete, this is hypothesis testing on steroids. And it is, at least for the moment, the future of science.

No comments:

Post a Comment