Davis replicated the CMBCM result with his own dataset, but then he started looking for other correlations in the data. It turns out that longer papers are also more likely to be cited—and, when Davis statistically controlled for that effect, the CMBCM result not only disappeared, it reversed. That is, long Science papers with more citations are slightly less likely to be cited than long Science papers with fewer citations. Building a still more complicated statistical model that incorporates the paper's length, subject area, and number of authors, Davis totally eradicated the effect of variation in the length of the Works Cited list.
Controlling now for pages, authors, and section, reference list length is no longer statistically significant. In fact, it looks pretty much like a random variable (p=0.77) and adds no information to the rest of the regression model.Davis's analysis looks convincing to me. It's hard to say, however, whether it conclusively refutes the result reported in Nature News. That's partly because the CMBCM analysis is derived from a much larger data set than Davis's; but more importantly, it was presented at a conference, not in a published article.
Conference papers often present preliminary results, and in the absence of a published Methods section, the News article doesn't tell us whether the coauthors controlled for the effects of the confounding factors Davis identifies or not. (Although it seems logical to conclude from the News piece that they didn't.) If the CMBCM data set is going to make it through peer review at a journal, however, its authors will have to account for confounding factors.