It is frankly amazing that Stevan can make such a qualified response
about sample size, statistical power, and what we should measure without
having read the study. Should I be surprised that his response is
merely a response to take the front-self-promoting stage?
BTW, we do measure the effect of self-archiving (whoever put the article
in a repository) on article downloads and citations.
--Phil Davis
Stevan Harnad wrote:
> On Tue, 1 Apr 2008, Philip Davis wrote:
>
> > We've been conducting a randomized controlled trial of open access
> > publishing with 7 publishers in the multidisciplinary sciences, biology,
> > medicine, social sciences, and humanities since January 2007.
> >
> > The type of methodology we're using (randomized controlled trial) is key
> > here since previous observational studies simply assume that
> > author-sponsored OA articles are qualitatively similar to
> > subscription-based
> > articles.
>
> Most prior studies simply compared articles within the same journals and
> years that were and were not made OA by being self-archived by their
> authors.
>
> The ideal study would be one that randomly *imposed* self-archiving on
> articles from within the same journals and years, and compared it with
> unimposed self-archiving for the same journals and years. This
> forthcoming
> study seems to do only half of this.
>
> A potential problem with assessing the effects of self-archiving
> on citations is, of course, the "self": Authors self-select to
> self-archive (some authors -- c. 15% -- do it, most don't), and authors
> can also self-select which of their papers they self-archive. Hence this
> leaves open the possibility that self-archived papers (and authors)
> are self-selected to be the better ones. And then the question is:
> What proportion of the enhanced citations of self-archived papers occurs
> because of OA and what percentage is because of self-selection?
>
> A study that imposes the OA self-archiving randomly could help answer
> this question.
>
> But a potential problem of this forthcoming study is time-scale and
> sample-size.
>
> The published findings on the higher citations for OA self-archived
> articles (e.g. Hajjem et al 2005) are based on hundreds of thousands of
> articles, in thousands of journals, across a number of fields, across
> a number of years. The effects are always the weakest in the first year
> or two after publication (depending on field), before the citations have
> had a chance to grow.
>
> During that early period, it is downloads rather than citations
> that reflect the OA advantage -- and downloads have been shown to be
> correlated with, and predictive of, later citations (Brody et al 2006):
>
> > Preliminary results from 11 journals published by the American
> > Physiological
> > Society indicate an increase in article downloads, although many of
> > these
> > downloads are attributable to indexing robots. The articles are
> > currently
> > between 11 and 14 months old and we see no citation advantage. In
> > fact, the
> > randomly selected OA articles received slightly fewer citations,
> > although
> > this result is non-significant.
> >
> > Our paper is currently in review and should be made public shortly.
>
> This profile (i.e., no difference) is perfectly compatible with the
> conclusion that the sample was too small and the time-span was too
> short to have picked up any effects at all. It is comparing apples and
> oranges unless there is a control group, in the same journal sample and
> year-span, consisting of self-selected, self-archived articles that *do*
> show the citation increase whose causes are here being tested.
>
> If an equal-sized sample of self-selected, self-archived articles from
> the same 11 journals, over the same 11-14 months, *did* show the citation
> increase, whereas the control sample with the self-archiving imposed did
> not, then we could make the inference that it is the self-selection that
> causes the citation increase.
>
> But with a small sample and a small time-span, and no difference, the
> most likely outcome is that neither group would yet show any citation
> advantage.
>
> (Some comparisons might possibly be made with the Eysenbach (2006)
> study, which was also based on a small sample sample -- a single very
> high-profile journal (PNAS) and about 1500 articles -- and a small
> time span. The OA/non-OA citation difference was found surprisingly
> early. There were two kinds of "self-archiving": most were done by
> PNAS on the (paying) authors' behalf, on the PNAS website; the other
> kind was done by (nonpaying) authors, on their own websites (or IRs). The
> lion's share of the early OA citation advantage was for the articles
> made OA on the PNAS site. But of course both kinds of OA self-archiving
> here were self-selected, rather than imposed. And the fact that the OA
> advantage was much bigger for the articles "self-archived" on the PNAS
> site suggests that the big early effect may have had something to do
> with being freely accessible at the much-consulted websites of one of
> the highest-citation journals of all.)
>
> > We conclude that the 'citation advantage' so widely promoted in the
> > literature is an artifact of other explanatory variables.
>
> These are rather big conclusions to draw from what seems to be a rather
> small study (that does not seem to control for the most important
> explanatory variable of all, which is unimposed self-selection, in the
> same sample and time-interval)!
>
> We are currently conducting a somewhat bigger study, comparing the size
> of the citation difference between self-archived and non-self-archived
> articles within the same journals and years for the four earliest of the
> institutions that mandate self-archiving. A mandate is not a guarantor
> that all articles will be self-archived; and mandates have not been
> around for that long either; but the prediction would be that if the
> self-archiving citation increase were all or mostly due to
> self-selection,
> then mandates should either reduce substantially, or eliminate the
> OA/non-OA difference, compared to the unmandated OA/non-OA difference.
>
> Our study compares the size of the self-archived/non-self-archived
> difference
> separately for mandated and unmandated self-archiving.
>
> Stay tuned.
>
> Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as
> Predictors of Later Citation Impact. Journal of the American
> Association for
> Information Science and Technology (JASIST) 57(8) pp. 1060-1072.
> http://eprints.ecs.soton.ac.uk/10713/
>
> Eysenbach, G, (2006) Citation Advantage of Open Access Articles. PLoS
> Biology
> 4(5): e157 DOI: 10.1371/journal.pbio.0040157
> http://dx.doi.org/10.1371/journal.pbio.0040157
>
> Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary
> Comparison of the Growth of Open Access and How it Increases Research
> Citation
> Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.
> http://eprints.ecs.soton.ac.uk/11688/
>
> Stevan Harnad
>
> > Philip Davis
> > PhD student
> > Cornell University, Dept. of Communication
Received on Wed Apr 02 2008 - 01:17:17 BST