Re: Does the arXiv lead to higher citations and reduced publisher downloads?

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Wed, 15 Mar 2006 00:34:53 +0000 (GMT)

On Tue, 14 Mar 2006, Phil Davis wrote:

> Liblicense, While our study confirms the same citation advantage
> reported by others, it does not attribute Open Access as the
> cause of more citations, but to Self-Selection. Open Access
> therefore may be a result, not a cause, of authors promoting
> higher-quality work.
>
> Does the arXiv lead to higher citations and reduced publisher downloads for
> mathematics articles?
> Authors: Philip M. Davis, Michael J. Fromerth
> Date: March 14, 2006
> http://arxiv.org/abs/cs.DL/0603056

The full text of Phil Davis's paper is not yet accessible, so I can only
respond to the abstract.

There are many plausible components of the OA advantage, of which
self-selection (Quality Bias: QB) is certainly one -- but not the only
one, and unlikely to be the principle one, except under a few special
conditions. QB is a temporary phenomenon, obviously, disappearing
completely at 100% OA. Same is true for the Competitive Advantage (CA) of
(comparable) OA papers over non-OA papers in the same journal issue,
as well as the Arxiv Advantage (the advantage of appearing jointly
in a central, widely consulted repository).

Once 100% OA is reached, QB, CA and AA all vanish. (AA vanishes because
of OAI interoperability and central harvesting services.)

But there are three other components that remain even at 100% OA:

Early Access Advantage (EA): The permanent citation boost from earlier access
Quality Advantage (QA): The permanent advantage of quality once the
    playing field has been levelled and affordability/accessibility no
    longer biases what is and is not accessible
Usage Advantage (UA): Average downloads for OA articles are at least
    double those of non-OA articles

    OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA
    http://eprints.ecs.soton.ac.uk/12085/

> An analysis of 2,765 articles published in four math journals
> from 1997-2005 indicated that articles deposited in the arXiv
> received 35% more citations on average than non-deposited
> articles (an advantage of about 1.1 citations per article), and
> this difference was most pronounced for highly-cited articles.
> The most plausible explanation was not the Open Access or Early
> View postulates, but Self-Selection, which has led to higher
> quality articles being deposited in the arXiv.

Without seeing the full text one cannot be sure of how this was
ascertained, but let us assume that it was by correlation (looking
at the author's track record, and their comparable non-OA articles, to
show that there is a strong correlation between prior author/article
citation rates and probability of later self-archiving).

There is no doubt at all that this is a causal factor, and indeed it is
the example set by the high-quality authors that helps encourage other
authors to self-archive.

But the only systematic way to show that QB is the *only* component of
the OA advantage, or the biggest one, is to test it at all levels of
self-archiving, from 1% to 99%. Obviously a citation advantage that
persists even as a larger and larger proportion of the research in the
field becomes OA is less and less likely to be due to the fact that the
best author/articles are the ones being self-archived.

And it also has to be tested for articles at all citation levels (i.e.,
for comparable low, medium, and high-citation articles). The OA
advantage is bigger at the higher citation levels, to be sure, but if it
is even present at the lower ones, that already shows that QB is
unlikely to be the only factor.

As to estimating the relative size of the causal contributions of each
of the 6 factors -- this will require a more fine-grained analysis,
taking into account not only %OA, citation level, and article age, but
also article deposit date. Equating average citation levels for the
authors and for the specialty domain will be necessary in the
comparisons, and a lot of journals will need to be sampled, in diverse
fields, to make sure patterns are not specialty-specific.

> Yet in spite of
> their citation advantage, arXiv-deposited articles received 23%
> fewer downloads from the publisher's website (about 10 fewer
> downloads per article) in all but the most recent two years after
> publication. The data suggest that arXiv and the publisher's
> website may be fulfilling distinct functional needs of the
> reader.

That sounds like the Arxiv Advantage (AA) expressed in the downloads
(UA).

Apart from total citation counts and downloads, other interesting
variables to look at (and compare for OA effects) include: citation
latency, citation longevity and other temporal measures; same for
downloads; also authority impact (similar to google's PageRank:
citations by higher-cited citers count for more), inbreeding/outbreeding
coefficients, co-citations, and semantic correlations.

Stevan Harnad

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
Cross-Disciplinary Comparison of the Growth of Open Access and How it
Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4)
pp. 39-47.
http://eprints.ecs.soton.ac.uk/11688/
Received on Wed Mar 15 2006 - 00:35:27 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:15 GMT