Re: ClinMed NetPrints from Stevan Harnad on 2001-05-23 (American-Scientist-Open-Access-Forum)

From: Stevan Harnad <harnad_at_coglit.ecs.soton.ac.uk>
Date: Wed, 23 May 2001 20:58:38 +0100

On Wed, 23 May 2001, Jim Till wrote:

> I didn't intend to [imply] that an eprint archive is analogous
> to a journal. There are a number of models for the role of eprint
> archives. One is to regard such archives as analogous to libraries;
> another is to regard them as analogous to databases.

Libraries are themselves analogous to databases. And I don't think the
question is settled by hypothetical models.

Journals exist today, and they perform a function: They implement peer
review. This yields a sign-posted, peer-reviewed literature (a quality
pyramid).

To compare or contrast eprint archives with journals is to assume that
the current contents of the eprint archives (pre-refereeing preprints
and post-refereeing postprints) would be of the quality they are even
if journals were not implementing refereeing. That is a 100% untested
assumption (and, in my opinion, one that is likely to prove very wrong,
if anyone ever takes the risky step of terminating peer review and
waiting to see what becomes of the quality of the ensuing literature).

    Harnad, S. (1998/2000) The invisible hand of peer review. Nature
    [online] (5 Nov. 1998)
    http://helix.nature.com/webmatters/invisible/invisible.html
    Longer version in Exploit Interactive 5 (2000):
    http://www.exploit-lib.org/issue5/peer-review/
    http://www.ecs.soton.ac.uk/~harnad/nature2.html

But until the experiment in question is actually performed, one can
say, without any hypothesizing, that to compare the quality of the
papers archived in eprint archives with the quality of the same
papers appearing in the journals that are currently controlling their
quality is a rather circular exercise.

> For example, in a model proposed by Paul Ginsparg, "The three layers are
> the data, information, and knowledge networks--where information is taken
> to mean data plus metadata (i.e. descriptive data), and knowledge
> signifies information plus synthesis (i.e. additional synthesizing
> information)", see: http://www.biomedcentral.com/info/ginsparg-ed.asp
>
> In this model, the arXiv eprint archive is located at the "data" level.

For most of the research world the token has not yet dropped: that it
is possible to immediately free the research literature online by
self-archiving it in interoperable eprint archives. For many
physicists, the token dropped a decade ago. But in fast-forwarding
their own discipline to what will prove to be the optimal and
inevitable for all disciplines, physicists did something practical:
They self-archived their preprints and postprints. They definitely did
not perform the experiment mentioned above: They continue to submit all
their papers to journals for refereeing, EXACTLY as they always did. So
the theory-neutral description of what they have done is: they freed
their research literature online by self-archiving it. The rest is
untested hypothesis and untestable interpretation.

Look at what the physicists have DONE, and emulate it; don't pay too
much attention to their THEORIES about what they have done.

> Rob Kling and Geoffrey McKim have suggested that: "Different scientific
> fields have developed and use distinctly different communicative forums,
> both in the paper and electronic arenas, and these forums play different
> communicative roles within the field", see:
> http://arxiv.org/abs/cs/9909008

It's early days. The only two "forums" worth talking about at the
moment are the traditional one (research papers appearing only in
on-paper and on-paper journals, surrounded by financial firewalls)
versus the new one (papers also appearing in free on-line
eprint archives, but pre- and post-refereeing).

The degree of usefulness of early dissemination of the pre-refereeing
draft may well prove to vary from discipline to discipline (though most
disciplines are still in the position of the man asked whether he can
play saxophone: "I don't know. I've never tried!"). But we can say with
considerable confidence that the usefulness of freeing the
post-refereeing draft is discipline-universal: NO discipline benefits
from needless access- and impact-blockage for its give-away findings:

http://www.ecs.soton.ac.uk/~harnad/Tp/science.htm

> That different models may be preferred by those in different fields
> probably stems in large part from differences in historical experience
> (see, for example, my article on "Predecessors of preprint servers" in
> Learned Publishing 2001; 14(1): 7-13; a version in HTML is available via:
> http://arXiv.org/html/physics/0102004).

It's too early to draw any deep historical conclusions! It can truly be
said that most researchers still don't know what they are doing, or why,
in any relevant respects here.

> About the biomedical field: the editor of Perspectives in Electronic
> Publishing (Steve Hitchcock), has commented that: "Biomedical researchers
> have been among the most eager to exploit the features of electronic
> publishing allied to freely available data services, yet at the same time
> acting to protect the formal structure and discipline imposed by
> journals", (see:
> http://aims.ecs.soton.ac.uk/pep.nsf/0dbef9e185359a288025673f006fadfd/fa5e35e7fed5053480256716003abf31?OpenDocument)

That's certainly true. But it merely illustrates my point (about the
blind leading the blind in all these "disciplinary divergences")...
Behaviorally, the physicists have simply been more sensible and quicker
off the mark. Let us not elevate the wheel-spinning of the other
disciplines to the level of a reasoned alternative!

> [sh]> In the new era of distributed, interoperable eprint archives, it
> [sh]> shows only what happens to appear in one arbitrary fragment of the
> [sh]> global virtual library into which the eprint archives are all
> [sh]> harvested.
>
> Agreed. But, the individual eprint archives must be designed to permit
> harvesting of their contents in this way. In my previous message, I
> referred to Greg Kuperberg's suggestion that a main criterion in
> evaluating an eprint archive should be "its suitability as part of the
> envisioned universal archive". Whether or not one prefers to regard this
> "universal archive" as a "global virtual library", this criterion still
> seems to me to be an appropriate one.

I could not follow any of this. The research literature, indeed the
refereed journal hierarchy, always did grade all the way down to the
level of a vanity press, which the user could always elect not to
consult. With journal "brand names" performing their usual sign-posting
function, to guide navigation, it is a relatively trivial issue whether
or not to "admit" something into the "universal archive."

> [jt]> Another criterion (it seems to me) should be its suitability for
> [jt]> obtaining citation data. An example, based on the arXiv archive, is
> [jt]> provided by the Cite-Base search service
> [jt]> (http://cite-base.ecs.soton.ac.uk/cgi-bin/search)
>
> [sh]> Correct. But cite-base is not measuring "archive-impact" but paper-
> [sh]> or author-impact. And it is measured across multiple distributed
> [sh]> archives.
>
> Agreed. But, again, the eprint archive must be designed to permit such
> measurements across multiple distributed archives. This second criterion
> also seems to me still to be an appropriate one.

But who cares about the impact of an eprint archive (except perhaps the
assessors of that institution's outgoing research):

http://www.ecs.soton.ac.uk/~harnad/Tp/thes1.html

What we will be searching the global research archive for is papers
(and/or authors) for the most part, just as we always did, and the new
impact measures will give as far richer ways to navigate and evaluate
it.

> [sh]> What's needed now is more archives, and the filling of them. The
> [sh]> quality measures will take care of themselves. The more papers are
> [sh]> up there, digitally archived, the more new measures of productivity
> [sh]> and impact they will inspire.
>
> Agreed. But, will these additional eprint archives always be designed
> such that the above two criteria are met?

What two criteria? Certainly the archives should be interoperable
(that's what www.openarchives.org is about, and what www.eprints.org
software is for), and certainly the citation-linking and impact-ranking
should be across all the distributed corpus, just as the harvesting is.
But apart from that, the only other criteria (apart from topic) are
"unrefereed/refereed" and, for the latter, the journal brand-name (just
as before).

> Are there additional criteria that should also be met - especially ones
> that will help to ensure that "the quality measures will take care of
> themselves"?

I'm not sure what you mean. All I meant was that the digital corpus
will spawn a lot of rich new scientometric measures to complement and
supplement the tired, old, classical citation-impact factor.

--------------------------------------------------------------------
Stevan Harnad harnad_at_cogsci.soton.ac.uk
Professor of Cognitive Science harnad_at_princeton.edu
Department of Electronics and phone: +44 23-80 592-582
             Computer Science fax: +44 23-80 592-865
University of Southampton http://www.ecs.soton.ac.uk/~harnad/
Highfield, Southampton http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM
Received on Wed Jan 03 2001 - 19:17:43 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:06 GMT