Critique of EPS/RIN/RCUK/DTI "Evidence-Based Analysis of Data Concerning Scholarly Journal Publishing" from Stevan Harnad on 2006-10-14 (American-Scientist-Open-Access-Forum)

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Sat, 14 Oct 2006 14:02:00 +0100

Hyperlinked version of this critique:
    http://openaccess.eprints.org/index.php?/archives/142-guid.html
--------------------------------------------------------------------
OVERVIEW: A Report on UK Scholarly Journals was commissioned by RIN,
RCUK and DTI, and conducted by ELS, but its questions, answers and
interpretations are clearly far more concerned with the interests of
the publishing lobby than with those of the research community.
    http://www.rin.ac.uk/data-scholarly-journals
    http://www.rin.ac.uk/
    http://www.rcuk.ac.uk/
    http://www.dti.gov.uk/
    http://www.epsltd.com
    http://openaccess.eprints.org/index.php?/archives/20-guid.html

The Report's two relevant overall findings are correct and stated very
fairly in their summary form:

> [1] "Overall, [self-archiving] of articles in open access
> repositories seems to be associated with both a larger number
> of citations, and earlier citations for the items deposited...
> http://opcit.eprints.org/oacitation-biblio.html The reasons for
> this [association] have not been clearly established - there
> are many factors that influence citation rates... Consistent
> longitudinal data over a period of years... would fill this gap."

> [2] "There is no evidence as yet to demonstrate any relationship
> (or lack of relationship) http://eprints.ecs.soton.ac.uk/10999/
> between subscription cancellations and repositories... Proving or
> disproving a [causal] link between availability in self-archived
> repositories and cancellations will be difficult without long
> and rigorous research."

The obvious empirical and practical conclusion to draw from the
findings -- that (1) all the self-archiving evidence to date
is positive for research and that (2) none of the self-archiving
evidence to date is negative for publishing -- would have been that
the research community should now apply and extend these findings -- by
applying and extending self-archiving (through self-archiving mandates
http://www.eprints.org/signup/fulllist.php) to all UK research output,
along with consistent, rigorous longtitudinal studies over a period of
years, to test (1) whether the positive effect on citations continues to
be present (and why) and (2) whether the negative effect on subscriptions
continues to be absent.

But instead, the two overall findings are hedged with volumes of special
pleading, based mostly on wishful thinking, to the effect that (1') the
observed relationship between self-archiving and citations may not be
causal, and that (2') there may exist an as-yet-unobserved causal
relationship between self-archiving and cancellations after all.

Even that would be alright, if this Report's conclusions were coupled
with a clear endorsement of the proposed self-archiving mandates, so
that the competing hypotheses can be put to a rigorous long-term test.
But the only test the commissioners of this Report seem to be interested
in conducting is "Open Option" publishing, i.e., authors paying
publishers to make their article OA for them, instead of self-archiving
it for themselves. This would certainly be a nice way to hold author
self-archiving and institution/funder self-archiving mandates at bay for
a few years more, while at the same time protecting publishers from
undemonstrated risk of revenue loss. But it would also leave global
unmandated self-archiving to continue to languish at the current
spontaneous 15% rate that the self-archiving mandates had been meant to
drive up to 100%. And it would leave research unprotected from its
demonstrated risk of impact loss. The option of having to pay to provide
OA is certainly not likely to enhance the unmandated rate of uptake by
authors (though I'm sure publishers would have no quarrel with funder
mandates to provide OA coupled with the funds to pay publishers' asking
price for paid OA, as provided by the Wellcome Trust).
http://www.wellcome.ac.uk/doc_wtd018855.html#P66_6964

The longterm test will nevertheless be conducted, because four out of
eight UK Research Councils have already mandated self-archiving.
http://www.rcuk.ac.uk/access/index.asp

Their citation rates and their cancellation rates can then
be compared with those for the four that have not mandated
self-archiving (and whose authors hence do it spontaneously
by "self-selection"). Alas this will be mostly comparing
apples and oranges (e.g. http://www.mrc.ac.uk/open_access MRC vs
http://www.ahrc.ac.uk/about/policy/ahrc_guidance_on_access_to_research_outputs.asp AHRC), and it will needlessly be depriving the oranges of
several more years of potential growth enhancement.

My guess is that all the other councils -- except possibly the paradoxical
EPSRC http://www.epsrc.ac.uk/AboutEPSRC/ROAccess.htm (which evidently
thinks, with the publishing lobby, that there's still some sort of
pertinent pretesting to be done for a few more years here) -- will come
to their senses long before that, unpersuaded by Reports like this one.
----------------------------------------------------------------------

    UK scholarly journals: 2006 baseline report
    An evidence-based analysis of data concerning scholarly journal
    publishing. http://www.rin.ac.uk/data-scholarly-journals

    Prepared on behalf of the http://www.rin.ac.uk/ Research Information
    Network, http://www.rcuk.ac.uk/ Research Councils UK and the
    UK http://www.dti.gov.uk/ Department of Trade and Industry.
    By http://www.epsltd.com Electronic Publishing Services Ltd

    In association with
    http://www.lboro.ac.uk/departments/ls/people/coppenheim.html
    Professor Charles Oppenheim and
    http://www.lboro.ac.uk/departments/dils/lisu/lisuhp.html LISU at
    Loughborough University Department of Information Science

This is a rather long and repetitious report, but it does contain a few
nuggets. It is obviously biassed, but biassed in a restrained way,
meaning it does not really try to conceal its biases, nor does it
overstate biassed conclusions. It also (reluctantly, but in most cases
candidly) acknowledges its own weaknesses.

(The Report was commissioned by RIN, RCUK and DTI, but it is
glaringly obvious that the questions, answers and interpretations
have been slanted toward the interests of the publishing lobby
http://openaccess.eprints.org/index.php?/archives/20-guid.html rather
than those of the research community -- possibly because the research
community has no lobby in this matter, apart from the OA movement
itself! Nevertheless, there has been considerable circumspectness,
at least in the summary and conclusion passages, with weak points and
gaps usually pointed out explicitly rather than denied or concealed,
and with the overall preoccupation with publishing interests rather than
research interests very open too.)

Some quote/comments:

> "Whilst some evidence does suggest that [self-archiving in]
> repositories [is] an important new factor in the journal
> cancellation decision process, and one which is growing in
> significance, there is no research reporting actual or even intended
> journal subscription cancellation as a consequence of the growth
> of OA self-archived repositories."

So far, this sounds fair and reasonable. (In fact, this is the gist of
the Report! The rest is mostly special pleading.)

> "Subscriptions are reported to have been declining over a period of
> 10+ years, but for a number of reasons. Proving or disproving a link
> between availability in self-archived repositories and cancellations
> will be difficult without long and rigorous research. In this
> connection, the outcome of research recently announced by the
> Research Councils UK (RCUK) with the co-operation of Macmillan,
> Blackwell and Elsevier, will be eagerly awaited, even though a
> report is not due until late 2008."

With evidence of self-archiving's benefits to research mounting,
http://opcit.eprints.org/oacitation-biblio.html
and zero evidence yet of any negative effect at all on publisher revenue,
http://eprints.ecs.soton.ac.uk/11006/
publishers nevertheless seem quite willing to wait (and keep research
waiting too), trying to fend off self-archiving and its potential benefits
to research for a long time to come yet, in order to keep trying to
find some evidence of negative causal effects on publisher revenue (or,
failing that, to deny positive causal effects on research impact).

Note that whereas a link between OA self-archiving and subscription
decline has not yet been "proved or disproved" (not for want of
looking!) -- and it is for that reason that we are hearing these calls
for "long and rigorous research" -- the vast preponderance of the
evidence we *do* have has already "proved" a "link" between OA
self-archiving and citation counts (a link that is almost certainly
causal, despite the wishful thinking of some who have a vested interest
in its all turning out to be merely a-causal self-selection and
superstition on the part of authors).

The question that the research community accordingly needs to ask itself
is whether self-archiving's evidence-based benefits to research should
be held in abeyance still longer, and meanwhile interpreted by default
as a-causal, in order to buy still more time to try to "prove/disprove"
hypothetical subscription declines for which there is no evidence
whatsoever to date, even in fields where self-archiving has been near
100% for years.

(Researchers should also go on to ask themselves whether the research
benefits should be held in abeyance even if they *are* causally
linked to a subscription decline: Is research impact to be sacrificed in
the service of publisher revenue? Are we conducting and funding research
in order to generate -- or to safeguard -- publisher revenue?)

> "There is no evidence as yet to demonstrate any relationship
> (or lack of relationship) between subscription cancellations
> and repositories. Work in this field would need sufficient,
> representative and balanced samples, and the collaboration of
> all stakeholders, including especially research institutions
> and publishers. Any such study will need to be maintained over
> a fairly extended period, with regular reports, since it seems
> likely that the position could change with time if the contents of
> self-archiving repositories become progressively more comprehensive."

This would be fine, if proposed as an extended research project to be
conducted *after* self-archiving mandates are in place, to analyze their
long-term effects on subscriptions.

But this would be an exceedingly self-serving suggestion on the part of
the publishing community (and a methodologically empty one) if meant as
a "pilot" study that must somehow be conducted *before* adopting
self-archiving mandates. (And it would be exceedingly self-defeating of
the research community to even consider accepting such a pre-emptive
suggestion as a precondition, before adopting self-archiving mandates.)

> "There is some consistency in results that show more citations for
> articles self-archived in repositories as distinct from the same
> or similar articles available [only via journal] subscription
> (although there have also been a few contradictory results).
> Overall, deposit of articles in open access repositories seems to
> be associated with both a larger number of citations, and earlier
> citations for the items deposited."

This a fair summary -- except that immediately after stating it, this
"association" is about to be deconstructed (much as the "association"
between cigarette-smoking and lung cancer was deconstructed for years
and years by the tobacco industry, claiming that only correlation had
been demonstrated, and not causation). Read on:

> "The reasons for this [association] have not been clearly
> established - there are many factors that influence citation rates,
> including the reputation of the author, the subject-matter of the
> article, the self-citation rate, and, of course, how important
> or influential the repository is in its own right. The little
> existing evidence suggests that a possible [sic] reason for
> increased citation counts is not that the materials were free, or
> that they appeared more rapidly, but that authors put their best
> work into OA format. This research was limited to one discipline,
> however [astronomy], and more extensive evidence is required to
> validate this finding."

This (important) study by Kurtz et al in astronomy, however,
http://cfa-www.harvard.edu/~kurtz/IPM-abstract.html
is not what the vast majority of the evidence (no longer little!) shows:
http://opcit.eprints.org/oacitation-biblio.html

Moreover, as noted, this a-causal interpretation -- only one of the
possible interpretations of the astronomy evidence -- also happens to be
the interpretation that the publishing community prefers for *all* the
self-archiving evidence, in all fields. The alternative interpretation is
that the relationship is causal: that the OA advantage is not merely an
arbitrary whim on the part of the better authors to make their work OA,
to no causal effect at all (why on earth would they be doing it at all
then?): They do it because making their work more accessible increases
its accessibility, uptake, downloads, usage, applications, citations,
impact -- exactly as the correlational evidence shows, without exception,
in field after field.

(NB: The only methodologically unexceptionable way to demonstrate
causation here, by the way, is to select a large enough random sample of
articles, divide them in half randomly, mandate half of them to be
self-archived and half not, and then compare their respective citation
counts after a few years. No one is likely to do quite *that* study
-- any more than it was likely that a large random sample of people
would be divided in half randomly, with half mandated to smoke and half
not! But we are in the process of doing an approximation to that causal
study,
http://www.crsc.uqam.ca/lab/chawki/ch.htm
by comparing the citation counts of articles in the IRs of the
(few) institutions that have already mandated self-archiving
http://www.eprints.org/signup/fulllist.php
with the average for other articles in the same journals/years in
which those articles appeared, but that have not been self-archived;
we will also compare the size of the OA advantage for mandated and
http://archives.eprints.org/ comparable non-mandated self-archiving. [We
do not believe for a moment that these data are necessary to demonstrate
causation, as causation is a virtual certainty anyway, but we are ready
to play the game, in order to try to cut short the absurd delay in doing
the obvious: mandating self-archiving universally.])

> "Although quite a lot of evidence has been collected regarding the
> quantitative effect of OA on citation counts (whether in the form of
> OA journals or as self-archived articles), much of it is scattered,
> uses inconsistent methods and covers different subject areas."

Yet, despite this scatter, methodological inconsistency and diversity,
virtually all of it keeps showing exactly the same consistent pattern:
A citation (and download) advantage for the OA articles. (No amount of
special pleading can make that stubborn pattern go away!)

> "Consistent longitudinal data over a period of years to measure
> IF trends in a representative range of journals would fill this gap"

There is no gap! There is a growing body of studies, across all fields
and all journals, that keeps showing exactly the same thing: the OA
advantage (in article citations and article downloads: this is not about
journal impact factors, especially because comparing different journals
is comparing apples and oranges).

(There seems to be a confusion here between the existence of the
correlation itself, between self-archiving and citation count counts --
this is found consistently, over and over -- and the question of the
causal relation, which will not be answered by longtitudinal data (we
have longtitudinal data already!) but by comparing mandated and
unmandated self-archiving: if they both show the OA advantage, then the
effect is causal and self-selection bias is a minor component.)

> "e.g., studying a range of journals that were toll-access and went
> OA (or vice versa). In the short-term, more data in different
> disciplines measuring the impact on citation counts of articles
> in hybrid journals or articles that are available in both forms
> versus articles that are only available in one of the forms will
> improve the evidence base."

No, the question about the reality and causality of the OA advantage
will not be settled by OA journal vs. non-OA journal comparisons;
that can always be dismissed as comparing apples with oranges, and,
failing that, can always be attributed to self-selection bias (i.e.,
choosing to publish one's better work in an OA journal)!

And if we wait for the uptake of hybrid Open Choice
http://www.springer.com/dal/home/open+choice?SGWID=1-40359-0-0-0
-- i.e., paying the journal to self-archive the published PDF
for you -- these "longtitudinal" studies are likely to take till
doomsday (and any positive outcome can still be dismissed as
self-selection bias in any case!).

What is needed is precisely the data already being gathered, on huge
samples, across all disciplines, comparing citation counts for
self-archived versus non-self-archived articles within the same journal
and year. The result has been a consistent, high OA Advantage (which has
elicited a lot of special pleading about causality).

So we will look at the mandated subset of the self-archived papers, to
try to show that the OA advantage is not (only, or mostly) a
self-selection effect (*Quality Bias* [QB]).

(There is undoubtedly a non-zero self-selection [QB] component
in the OA advantage, but there are many other components as well,
including a *Quality Advantage* [QA], an *Early Access Advantage*
[EA], a *Competitive Advantage* [CA, which will, like QB, vanish once
all articles are OA], and a *Usage (Download) Advantage* [UA]. At 100%
OA, there will no longer be any QB or CA (or *Arxiv Advantage [AA]*),
but EA, QA and UA will still be going strong. EA and UA components have
already been confirmed by the Kurtz study in astronomy.
http://cfa-www.harvard.edu/~kurtz/IPM-abstract.html
QA is implied by the repeated finding of a positive correlation between
citation count and the proportion of those articles with that citation
count that are OA.
http://eprints.ecs.soton.ac.uk/11688/
The mandate study will try to show that this correlation is causal,
i.e., QA, not QB.)

    Harnad, S. (2005) OA Impact Advantage = EA + (AA) + (QB) + QA +
    (CA) + UA. http://eprints.ecs.soton.ac.uk/12085/

> "The whole area of the relationship between citation counts
> and scholarly communication channels is confused because of
> problems associated with quality bias [QB] (e.g., if scholars
> tend to self-archive only their best work, as suggested by Kurtz
> et al. [in astronomy]; alternatively, it may be that only the
> best journals are OA). In other words, differences in citation
> counts and IFs may simply reflect the quality of the materials
> under study rather than having anything to do with the channel
> by which the material is made available."

First, the issue is article citation counts, not journal Impact Factors
(IFs).

Second, this is all special pleading. The biggest OA effects are based
on comparing articles within the same journal/year. The size of the
effect is indeed correlated with the quality of the article, because no
amount of accessibility will generate citations for bad articles,
whereas good articles benefit the most from a level playing field, with
all affordability/accessibility barriers removed: that is the Quality
Advantage [QA]. The idea that the Quality Advantage is merely a Quality
(Self-Selection) Bias [QB], i.e., that the advantage is merely
correlational, not causal, is of course a logical possibility, but it is
also highly improbable (and would imply that accessibility/affordability
barriers count for nothing in usage and citations, and that the better
work is being made OA by its authors for purely superstitious reasons,
because doing so has no effect at all!).

> "Overall, we concur with Craig's introduction that "the problems
> with measuring and quantifying an Open Access advantage are
> significant. Articles cannot be OA and non-OA at the same time."

They need not be. It is sufficient if we take a large enough
sample of articles that are OA and non-OA from the same journals and
years. Randomly imposing the self-archiving would be the only way to
equate them completely (and our ongoing study on mandated self-archiving
will approximate this).

(The analysis by Craig, commissioned by Blackwell Publishing, has not,
so far as I know, been published.)

> "Further, the variation of citation counts between articles can
> be extremely high, so making controlled comparisons of OA vs.
> non-OA articles nigh on impossible" [Craig, Blackwell Publishing]

(The way Analysis of Variance works is to compare variation between and
within putatively different populations, to determine the probability
that they are in reality the same population. The published comparisons
show that the OA/non-OA differences are highly significant, despite the
high variance.)

It would of course be absurd to try to compare citation counts for OA
and non-OA articles having the same citation counts. But we can compare
OA and non-OA *article* counts among articles having the same
citation counts, in the same journals -- and what we find is a strong
positive correlation between the citation count and the proportion of
articles that are OA (just as Lawrence reported in 2001,
http://www.nature.com/nature/debates/e-access/Articles/lawrence.html
but not only in computer science, but across all 12
disciplines studies so far, and with much bigger sample sizes):

> Source 4.8: Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year
> Cross-Disciplinary Comparison of the Growth of Open Access and
> How it Increases Research Citation Impact. IEEE Data Engineering
> Bulletin 28(4) pp. 39-47. http://eprints.ecs.soton.ac.uk/11688/

Note that the appendix to the Report under discussion here,
states, in connection with the above study, which it cites:
http://www.rin.ac.uk/files/UK%20Scholarly%20Journals%202006%20Baseline%20Report%20-%20Appendix.pdf

> "Harnad is THE advocate of OA and, thus, whilst expert in the field,
> is inevitably biased."

There is a bit of irony in the fact that in connection with another of
the studies it cites:

> Source 4.9: Harnad, S, Brody, T, Oppenheim, C et al, Comparing the
> impact of open access versus non open access articles in the same
> journals, D-Lib Magazine, 10,(6), 2004,
> http://www.dlib.org/dlib/june04/harnad/06harnad.html

the appendix of the Report goes on to say:

> "Harnad is THE exponent of OA, but, thus, potentially less
> objective."

Ironic (or, shall we say, conflicted, since this Report aspires to be a
neutral one as between the interests of the research community and the
publisher community), because the sole named collaborator on the Report
is also a co-author of the above-cited study!

Let us agree that we all have views on the underlying issues, but that
reliable data speak for themselves, qua data, and our data (and those of
others) keep showing the same consistent OA Advantage. The disagreement
is only on the interpretation: whether or not the consistent
correlations are causal. And here, allegiances are tugging on both
sides: Those favouring causality tend to come from the research
community, those favouring a-causality tend to come from the publishing
community. (Let us hope that the data from mandated self-archiving will
soon settle the matter objectively.)

> "[since] any Open Access advantage appears to be partly
> [sic] dependent on self-selection, the more articles that are
> {self-}archived... you'd expect to see any Open Access advantage
> reduce." [Craig, Blackwell Publishing]

Note that Craig carefully says "partly" -- and that we agree that
self-selection is one of the many potential contributors to the OA
advantage.

We also agree, of course, that once 100% OA is reached, the OA citation
advantage -- in the form of an advantage of OA over concurrent non-OA
articles -- will be reduced: indeed it will vanish! With all articles
OA, there can no longer be either a Competitive Advantage [CA] or a
Self-Selection Advantage (Quality Bias, QB) of OA over (non-existent)
non-OA.

But the Quality Advantage [QA] will remain. (Higher quality articles
will be used and cited more than they would have been if they had not
been OA: this is not a competitive advantage but an absolute one.) And
the Early Advantage [EA] as well as the Usage (Download) Advantage [UA]
will remain too (as already shown by Kurtz's findings in Astronomy).

> "Authors self-archiving in the expectant belief that each and
> every paper they archive will receive an Open Access advantage of
> several hundred percent are going to be sorely disappointed."
> [Craig, Blackwell Publishing]

This too is correct, but who on earth thought that OA would guarantee
that all work would be used, whether or not it was any good? OA levels the
playing field so merit can rise to the top, unconstrained by accessibility
or affordability handicaps. But bad remains bad, and let's hope that
researchers will continue to avoid trying to build on weak or invalid
findings, whether or not they are OA.

The OA advantage is an *average* effect, not an automatic bonus for
each and every OA article; moreover, the OA advantage is highly
correlated with quality: The higher the quality, the higher the
advantage. It is this effect that is open to the a-causal interpretation
that the Quality Advantage [QA] is merely a Quality Bias [QB]
(Self-Selection). But, equally (and, in my view, far more plausibly) it
is open to the causal interpretation that OA causes wider usage and
citation precisely because it removes all accessibility/affordability
constraints that are currently limiting uptake and usage. That does not
mean *everything* will be used more, regardless of quality
("usefulness"): But it will allow users (who are quite capable of
exercising self-selection too!) to access and use the better work,
selectively.

In addition, since the distribution of citations is not gaussian --
a small percentage of articles receives most of the citations and
more than half of articles receive no citations at all -- it is almost
axiomatic that the OA advantage will be strongest in the high-quality
range http://www.crsc.uqam.ca/lab/chawki/classement_citations.htm

> "Finally, it is worth noting that all researchers in the field
> are agreed that if the vast majority of scholarly publications
> become available in OA form, no citation advantage to OA will be
> measurable."

It is a tautology that with 100% OA, the OA/NOA ratio is undefined!
But EA will still be directly measurable, and it will be possible to infer
UA and QA indirectly (UA by comparing downloads for articles of the same
age, before and after OA for the same articles, and QA by doing the same
with citations; the Kurtz study used such methods in Astronomy. But by
that time (100% OA), not many people will still have any interest in
the a-causal hypothesis.

> "Thus, what OA advantage there is will prove to be temporary if
> OA does become the standard mode of publication."

This, however, is simply incorrect. At 100% OA, the Competitive Advantage
(CA) will be gone; the Self-Selection Advantage (Quality Bias, QB)
will be gone; the method of comparing citation counts for OA and non-OA
articles within the same journal and year will be gone. So much is true
by definition.

But (as Kurtz has shown in Astronomy), the Early Advantage and the Usage
Advantage will still be there. And the Quality Advantage, will still be
there too; and that was what this was all about: Not just a horse-race
for who can make his articles OA first, so as to reap the competitive
advantage before 100% OA is reached (though that's not a bad idea!); not
a guarantee that, no matter how bad your work, you can increase your
citations by making them OA; but a guarantor that with access-barriers
removed, quality will have the best chance to have its full potential
impact, to the benefit of research productivity and progress itself, as
well as the authors, institutions and funders of the high quality work.

(There is a bit of a [lurid] analogy here with saying that if only we
can get everyone to smoke, it will be clear that smoking has no
differential effects on human health! Perhaps the converse is a better
way to look at it: if only we could get everyone to stop smoking,
smoking will no longer have a differential effect on human health!)

(PS: OA is not a "mode of publication": OA *publication* is a mode
of publication. OA itself is a mode of access-provision, which can be
done in two ways, via OA publication or via OA self-archiving of non-OA
publications.)

> "Self archived articles.

> "It is this area that has been most studied, with numerous key
> publications. Most of these are focussed on the citation advantage
> of self-archived articles rather than of OA journals. Craig, in an
> as yet unpublished review, provides an excellent overview of the
> evidence collected to date. Lawrence (Source 4.13) is significant
> because it was the first major paper that identified a citation
> advantage for OA self-archived articles, and it has been widely
> cited ever since. However, it was based on a too small-scale a
> study to support general conclusions. Harnad et al. (Source 4.9)
> provides a useful summary of the state of play of OA advantage
> studies, while Hajjem et al. (Source 4.8 ) is fairly typical of
> the many articles produced by Harnad claiming that self-archiving
> leads to higher citation counts."

Let us be clear: The many OA vs. non-OA studies, ours and everyone else's,
across more than a dozen different disciplines, many of them based
on large-scale samples, all show *the very same consistent pattern of
positive correlation between OA and citation counts*. Those are *data*,
and they are not under dispute. The only "claim" under dispute is that
that consistent correlation is causal...

> "Antelman (Source 4.1) is arguably the most carefully constructed
> study of the question. Articles in four disciplines were evaluated,
> and in each case it was found that open access articles had greater
> citation counts than non-open access articles."
> http://eprints.rclis.org/archive/00002309/

One wonders why this particular small-scale study (of about 2000 articles
in 4 fields) was singled out, but in any event, it shows *exactly* the
same pattern as all the other studies (some of them based on hundreds
of thousands of articles instead of just a few thousand, in three times
as many fields).

> "Eysenbach challenges the notion that OA "green" articles (i.e.,
> those in repositories) are more effective than OA "gold" (i.e.,
> those published in OA journals, such as those produced by Public
> Library of Science) in obtaining high citation counts. It is this
> part of his paper that produced a furious response from Harnad,
> much of it focused on particular details."
http://biology.plosjournals.org/perlserv?request=get-document&doi=10.1371/journal.pbio.0040157

The issue was not about OA green (self-archived) articles producing
higher citation counts than OA gold (OA-journal)! No one had claimed
one form of OA was more effective than the other in generating the OA
Advantage before the Eysenbach study: It was Eysenbach who claimed to
have shown gold was more effective than green -- indeed that green was
only marginally effective at all!

And I think anyone reading the exchanges will see that all the fury is
on the Eysenbach side. All I do is point out (rather patiently) where
Eysenbach is overstating or misstating his case:

    Harnad, S. (2006) PLoS, Pipe-Dreams and Peccadillos *PLoS Biology*
    eletters (May 16, 2006)
http://biology.plosjournals.org/perlserv/?request=read-response&doi=10.1371/journal.pbio.0040176#top
    http://openaccess.eprints.org/index.php?/archives/87-guid.html
    http://openaccess.eprints.org/index.php?/archives/88-guid.html
    http://openaccess.eprints.org/index.php?/archives/89-guid.html
    http://openaccess.eprints.org/index.php?/archives/90-guid.html

Eysenbach's study does find the OA advantage, as many others before it
did. It certainly doesn't show that the gold OA advantage is bigger than
the green OA advantage, in general. It simply shows that for the
1500-article sample in the one journal tested, Proceedings of the
National Academy of Sciences (PNAS http://www.pnas.org/), a very high
impact journal, both paid OA (gold) and green OA (free) increased citation
counts over non-OA, but gold increased them more than green. That result
is undisputed. Its extrapolation to other journals is:

The likely explanation of the PNAS result is very simple: PNAS is not a
randomly chosen, representative journal: it is a very high-impact, very
high visibility, interdisciplinary journal, one of very few like it
(along with *Nature* and *Science*). Articles that pay for OA
are immediately accessible at PNAS's own high-visibility website -- a
website that probably has higher visibility than any single
institution's IR today.
http://archives.eprints.org/
So PNAS articles made freely accessible at PNAS's website get a bigger
OA advantage than PNAS articles made made freely accessible by being
self-archived in the author's own IR.

The reason it definitely does not follow from this that gold OA is
bigger than green OA is very simple: Most journals are not PNAS, and do
not have the visibility or average impact of PNAS articles! Hence
Eysenbach's valid finding for one very high-impact journal simple does
not generalize to all, most, or even many journals. Hence it is not a
gold/green effect at all, but merely a very high-end special case.

Apart from the spurious gold/green advantage, Eysenbach did confirm, yet
again, (1) the OA advantage itself, and confirmed it (2) within a very
short time range. These are both very welcome results (but not
warranting to be touted, as they were, by both the
author and by the accompanying PLoS editorial,
http://www.jmir.org/2006/2/e8/
http://biology.plosjournals.org/perlserv/?request=get-document&doi=10%2E1371%2Fjournal%2Epbio%2E0040176
as either the first "solid evidence" of the OA advantage -- they
certainly were not that -- or a demonstration that gold OA generates more
citations than green OA: the very same method has to be tried on middle
and low-ranking journals too, before drawing that conclusion!). (Nor are
the PLoS/PNAS results any more exempt from the methodological possibility
of self-selection bias [QB] than any of the many prior demonstrations
of the OA advantage, as authors self-choose to pay PNAS for gold OA as
surely as they self-choose to self-archive for green OA!)

The fury on Eysenbach's part came from my pointing out that his and
PLoS's claim to primacy for demonstrating the OA advantage (and their
claim of having demonstrated a general gold-over-green advantage) was
unfounded (and might have been due to both PLoS's and Eysenbach's zeal
to promote publication in gold journals: Eysenbach is the editor of one
too, but not a high-end one like PNAS or PLoS): Eysenbach's was just the
latest in a long (and welcome) series of confirmations of the OA
advantage (beginning with Lawrence 2001), the prior ones having been
based on far larger samples of articles, journals and fields (and there
was no demonstration at all of a general gold over green advantage: just
the one non-representative, hence non-generalisable special case of
PNAS).

> "Both authors believe that OA produces a citation advantage,
> but Eysenbach has presented evidence that casts doubt on Harnad's
> notion that the "green" route is the preferred route to getting
> that increased impact."

Green may not be the preferred route to OA for editors of gold journals,
but it is certainly the preferred route for the vast majority of authors,
who either have no suitable gold journal to publish in, or lack the funds
(or the desire) to pay the journal to do what they can do for free for
themselves. The only case in which paid gold OA may bring even more
citations than free green OA (even though both increase citations)
is in the very highest quality journals, such as PNAS, today -- but
that high-end reasoning certainly does not generalise to most journals,
by definition. (And it will vanish completely when OA self-archiving is
mandated, and the harvested IR contents become the locus classicus to
access the literature for those whose institutions are not subscribed
to the journal in which a particular article appeared -- whether or not
it is a high-end journal.)

(There is also a conflation of the (less interesting) question of (1)
whether green or gold generates * a greater OA citation advantage*
[answer, for high-end journals like PNAS, gold does, but in general
there is no difference] with the (far more important) question of (2)
whether green or gold can generate *more OA* [answer: green can
generate far more OA, far more quickly and easily, not just because it
does not cost the author/institution anything, but because it can be
mandated without needing either to find the extra funds to pay for it or
to constrain the author's choice of which journal to publish in].

> "However, despite the intuitive attractiveness of the hypothesis
> that OA will lead to increased citations because of easier
> availability, the one systematic study of the reasons for the
> increased citations - by Kurtz (Source 4.12) - showed that in the
> field of astronomy at least, the primary reason was not that the
> materials were free, or that they appeared more rapidly, but that
> authors put their best work into OA format, and this was the reason
> for increased citation counts."

Astronomy is an interesting but anomalous field: It differs from most
other fields in that:

(1) Astronomy consists of a small, closed circle of journals.

(2) Virtually all research-active astronomers (so I am told by the
author) have institutional access to all those journals.

(3) For a number of years now, that full institutional access has been
online access.

(4) So astronomy is effectively a 100% OA field.

(5) Hence the only room left for a directly measurable OA advantage in
astronomy is (5a) to self-archive the paper earlier (at the preprint
stage) [EA] or (5b) to self-archive it in Arxiv (which has evolved into a
common central port of call, so it generates more downloads and citations
-- mostly at the preprint stage, in astronomy).

(6) What Kurtz found, was that under these conditions, higher quality
(higher citation-count) papers were more likely to be self-archived.

(7) This might be a quality self-selection effect (QB) (or it might
not), but it is clearly occurring under very special conditions, in a
100% OA field.

(8) Kurtz did make another, surprising finding, which has bearing on the
question of how much of a citation advantage remains once a field has
reached 100% OA.

(9) By counting citations for comparable articles before and after the
transition to 100% OA, Kurtz found that the citations per article had
actually gone *down* (slightly) rather than up, with 100% OA.

(10) But a little reflection suggests a likely explanation: This slight
drop is probably a shift in balance with a level playing field:

(11) With 100% OA (i.e., equal access to everything), authors don't cite
more articles, they cite more *selectively*, able now to focus on
the best, most relevant work, and not just on the work their
institutions can afford to access.

(12) Higher quality articles get more citations, but lower quality
articles of which there are far more (some perhaps previously cited by
default, because of accessibility constraints) are cited less.

(13) On balance, total citations are slightly down, on this level
playing field, in this special, small, closed-circle field (astronomy),
once it reaches 100% OA.

(14) It remains to be seen whether total and average citations go up or
down when other fields reach 100% OA.

(15) What Kurtz does report even in astronomy is that although total
citations are slightly down, downloads are doubled.

(16) Downloads are correlated with later citations, but perhaps at 100%
OA this is either no longer true, or true only for higher quality
articles.

> "Similarly, more carefully conceived work on the impact of
> both OA journals and self-archiving on the quality of research
> communications, especially on the peer review system, will be
> required."

OA journals are peer-reviewed journals: What sort of impact are they
feared to have on peer review?

And why on earth would the self-archiving of peer-reviewed, published
postprints have any impact on the peer review system? The peers review
for free. (Could this be just a veiled repetition of the question about
the impact of self-archiving on journal revenues, yet again?)

> "Recently, the results of a study undertaken by Ware for ALPSP,
> which were published in March 2006 (Source 1.16, in Area 1), have
> provided at least some initial data on the question of the possible
> linkage between the availability of self-archived articles in an OA
> repository and journal subscription cancellations by libraries...:
> availability of articles in repositories was cited as either a "very
> important" or an "important" possible factor in journal cancellation
> by 54 per cent of respondents, even though ranking fourth after (i)
> decline of faculty need, (ii) reduced usage, and (iii) price. When
> respondents were invited to think forward five years, availability
> in a repository was still fourth-ranking factor, but the relevant
> percentage had risen to 81. Whilst this is not evidence of actual
> or even intended cancellation as a consequence of the growth of
> OA self-archiving repositories, it strongly suggests that such
> repositories are an important new factor in the decision process,
> and growing in significance."

Summary: No evidence of cancellations, but speculations by librarians to
the effect that their currently fourth-ranking factor in cancellations
might possibly become more important in the next five years...

Sounds like sound grounds for fighting self-archiving mandates and
trying to deny research the benefit of maximized impact for yet another
five years -- if one's primary concern is the possible impact of
mandated self-archiving on publishers' revenue streams. But if one's
primary concern is with the probable impact of mandated self-archiving
on research impact, this sort of far-fetched reasoning has surely earned
the right to be ignored by the research community as the self-serving
interference in research policy that it surely is.

Hyperlinked version of the above text:
    http://openaccess.eprints.org/index.php?/archives/142-guid.html

Stevan Harnad
American Scientist Open Access Forum
http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
Received on Sat Oct 14 2006 - 14:27:14 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:32 GMT