Re: Does the arXiv lead to higher citations and reduced publisher downloads?

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Wed, 22 Mar 2006 00:29:13 +0000 (GMT)

On Tue, 21 Mar 2006, Phil Davis wrote:

> The data that Kristin illustrates do not show causation, only
> association.

The data Phil illustrates likewise do not show causation, only
association.

> What I am arguing,
> however, is that the likely (and primary) cause of the citation
> advantage is not increased access, but some quality differential,
> leading to better articles being deposited in the arXiv.

In other words, you are making a causal inference despite only
having data on correlation (association). Fair enough. Others are make
other causal inferences, likewise based on data on correlation
(association).

> In our
> manuscript, we argue that if OA-as-cause is present, its scope is
> severely limited to highly-cited articles. How can we say this?

What your data show is that the OA Advantage (which everyone confirms)
is stronger on the high-end, and this could either be because people tend
to self-archive high-end articles more (QB), or because the OA Advantage
is stronger on the high end (QA). Either way it's a quality effect. One
way it's a Quality Bias (QB), the other way it's a Quality Advantage
(QA). You think it's mostly QB, I think it's mostly QA. The data are
compatible with both. More fine-tuned causal tests are needed to decide.

> If increased access was the cause of increased citations in our
> data, we should see a significant and positive correlation
> between fulltext article downloads from the arXiv and the number
> of citations an article receives.

The OAA pertains to whether or not an article is self-archived, not to
how often it is downloaded. But there is also a correlation between
download counts and (later) citation counts, as well as a correlation
between whether or not an article is self-archived and its download
counts. Three correlations. Still no causation. And compatible
with QB, QA or both.

Now it appears that the download/citation correlation for these data is
there toward the high-end, not the low. That too is correlation. It
means that the correlation between downloads and citations is not a
straight linear one; there may be a threshold effect or an acceleration
at the high end. Still just correlation, not causation. And compatible
with QB, QA or both.

> The rationale is that article
> repositories increases readership, some of which leads to
> increased citations (this is the argument that SPARC, Harnad,
> Suber and others use to justify the use of archives). Now please
> take a look at Figure 3 in our paper
> (http://arxiv.org/pdf/cs.DL/0603056). Notice that this positive
> association only applies to highly-cited articles (note: the
> inverse log of 2.5 is about 316 downloads).

To repeat: You have shown a high-end correlation. That is not a
demonstration that all or most of the cause is QB.

> In order to argue for causation one must be able to describe and
> measure the mechanism by which the cause takes place. Antelman
> (and others) demonstrate only the association between open access
> and citations, and infer that open access must be the cause. In
> our paper, we test the Open Access postulate, the Early View
> postulate, and a Quality Differential postulate. Of these three
> we feel that the Quality Differential is the strongest
> explanation for the data. We do not rule out Open Access
> completely, but the data do suggest that if access is responsible
> for increased citations, this effect may only take place for
> already highly-cited articles.

You test EA, QB and QA. You (unlike others, in other fields) find no EA
in maths. You do not and cannot differentiate QB from QA with your data.

Stevan Harnad
Received on Wed Mar 22 2006 - 01:17:14 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:16 GMT