Comments on Barry Mahon's ICSTI Forum article

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Thu, 9 Jan 2003 21:52:48 +0000

Using the figures provided by David Goodman in this Forum recently:
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2534.html

Here is what it would cost, and how much it would save, if universities
paid only the annual peer-review costs for their own outgoing
(self-archived) research, instead of the annual toll-access fees to buy
in other universities outgoing research. (Figures from further
universities are invited!)

For 2001 (the latest year available):

---------- annual current annual $500 per article annual
UNIVERSITY article incoming serials peer review percentage
---------- output toll-budget only savings

Cornell 4848 $5.6 million $2.43 million 57%
Dartmouth 1492 $3.2 million $0.75 million 77%
Princeton 3132 $4.7 million $1.57 million 66%
Yale 4463 $6.4 million $2.23 million 64%

Comments by Stevan Harnad on:

              "What is all this stuff?"
              Barry Mahon, Executive Director, ICSTI,
              The International Council for Scientific
              and Technical Information
              ICSTI Forum No. 41 November 2002
              http://www.icsti.org/forum/41/index.html

> Barry Mahon:
> open access or Open Archive
>
> An Open Archive (OA) consists of a collection of (scientific for
> the most part, at least in the original forms of OAs) publications
> which are available without any restriction on their availability,
> in the sense of a subscription or payment to view them, coupled with
> an inbuilt permission to quote from and even in some cases to modify
> the content.

An Open Archive is an OAI-compliant Archive. That means it uses the OAI
-- Open Archives Initiative http://www.openarchives.org -- metadata
harvesting protocol for tagging its content in such a way as to make
all OAI-compliant Archives *interoperable*: seamlessly navigable and
harvestable, as if they all formed one integrated global virtual-archive.

Metadata making it all interoperable include descriptors such as
authorname, journalname, articletitle, date, etc.

There is in general no intention on the part of OAI to allow
modification of the *content* of the full-text articles themselves, and
the metadata are meant to be standardised so as to be interoperable. But
*form* can certainly be manipulated by the various harvesters of the
contents of OAI-compliant Archives.

> It seems to have started in an attempt to speed up the publishing
> process, at least from the point of view of getting the results
> into the discussion arena as quickly as possible. The electronic
> only aspect of most OAs is incidental in the sense that speed of
> dissemination was the primary function, ipso facto, electronic
> dissemination was the preferred path. Consequently they started in
> fields of research which were already "connected" electronically. It
> is worth noting in passing that high energy physics was one of the
> first fields of scientific endeavour to embrace OA, as it had been
> in the origins of the Net, driven as it was then by the need to
> get access to advanced computing facilities, no matter where they
> were located.

It was not *publication* that eprint self-archiving by physicists
speeded up but *access*. All self-archived papers by physicists
continued -- and continue to the present day -- to be submitted to the
peer-reviewed journals, and to go through the often-slow process of peer
review, revision, re-refereeing, acceptance, publication, and eventual
appearance as print-on-paper. This slow process was *supplemented* by
a new form of access made possible by the Internet: The pre-peer-review
preprint was publicly self-archived online, often so were the successive
revisions during the refereeing, and eventually also the peer-reviewed
postprint (plus any subsequent post-publication revisions and corrections
of that).

The reason this practise began with physicists was because they had
already had a "culture" of disseminating pre-publication "preprints" among
themselves even in paper days -- again, to hasten *access* to research
results, both before and after peer-review. It was only natural to extend
this existing practise to the new medium that was so much better suited
for it.

But there was never anything about the practise that made it uniquely
or especially suitable for physics. It was simply a way of accelerating
and maximizing research access and hence research impact (and thereby
research productivity and progress). So now that self-archiving can
be done simply, cheaply, and effortlessly in their own institutional
archives by researchers in all disciplines, it is time to get the message
of its benefits across, so researchers in all disciplines can adopt
the practise:

    Lawrence, S. (2001a) Online or Invisible? Nature 411 (6837): 521.
    http://www.neci.nec.com/~lawrence/papers/online-nature01/

    Lawrence, S. (2001b) Free online availability substantially increases a
    paper's impact. Nature Web Debates.
    http://www.nature.com/nature/debates/e-access/Articles/lawrence.html

> OA even has some confusion in the name - it is expanded to OAI
> - meaning Open Archive Initiative but OAI is also used for open
> access initiative, by those who are advocating that researchers
> should publish their papers on e-print servers - often local to
> their research team and available freely. So what is the difference?

No, OAI is not being used to refer to "Open Access Initiative." "BOAI"
-- Budapest Open Access Initiative http://www.soros.org/openaccess/ --
is the term that was consciously chosen to designate the Open Access
movement, while preserving the technical link with OAI, the metadata
"glue" that makes interoperability possible.

(If there was any confusion at all, it was perhaps in that the OAI
actually began as a kind of open-access movement -- the UPS [Universal
Preprint Service] -- for which interoperability was merely a means
rather than an end. But then when the metadata-tagging standards were
first created at a Santa Fe meeting, and called the "Santa Fe
Convention," the name "Open Archives Initiative" was adopted, and the
OAI evolved into an interoperability standard for all digital content,
and not just the open-access sector. It was then that the BOAI evolved,
to occupy specifically the open-access niche. [This history is archived
at the OAI site http://www.openarchives.org/meetings/ ]).

> As far as I can understand it the difference is that OAI - in capitals
> - is the generic acronym for a set of computing based initiatives
> to make it possible to access content published on the web, not
> just scientific papers but all types of publications. One of the
> important elements of OAI development is the creation of standards
> for collecting metadata from these publications so that they can be
> identified. The other - open access initiative - in non capitals,
> is concerned with allowing scientists to publish in such a way that
> their publications become available to all in sundry with no or little
> "barrier" in the sense of costs.

It is the BOAI (Budapest Open Access Initiative) -- in capitals! -- that
constitutes the "open access initiative," and BOAI actually consists of
two complementary strategies:

BOAI-1: the self-archiving of all refereed research papers, before and
after peer-review and journal publication
http://www.nature.com/nature/debates/e-access/Articles/harnad.html

BOAI-2: the creation of new open-access journals (and the conversion of
existing toll-access journals to open-access)
http://www.nature.com/nature/debates/e-access/Articles/Eisen.htm

The two BOAI strategies -- and "open access" itself -- are described in
http://www.soros.org/openaccess/read.shtml

> The two are important to ICSTI Members.

> Open Archive

> OAI is important because it aims to be the sine qua non of web
> publishing, in the sense of substantial content published on the web.

OAI has nothing directly to do with publishing, which is something
done by journal publishers. It is a protocol for the online archiving
of the metadata describing the contents of the publication, and making
all OAI-compliant archives interoperable.

> Just as an aside, the phrase web publishing is also very confused,
> it appears to mean any content which appears on a web site. In our
> scientific world publishing has tended to have a specific meaning,
> the activity of collecting material and making it available,
> so in that sense web publishing is a natural extension, however,
> publishing as an activity tends to have had associated with it the
> concept of risk, the taking of risks as a publisher was understood
> to mean that they undertook to publish in the hope of a return on
> investment.

For the special subset of the written corpus with which the BOAI is
exclusively concerned -- peer-reviewed research journal publications --
"publication" means only one thing: certified acceptance by a
peer-reviewed journal.

    Garfield: "Acknowledged Self-Archiving is Not Prior Publication"
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2239.html

All the rest is not a matter of *publishing* but of *archiving* (and
hence access).

    "Distinguish self-publishing (vanity press) from self-archiving (of
    published, refereed research)"
    http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm#1.4

(Notice that no mention whatsoever is made of "risk" or "publisher's risk"
in any of these concepts, for the simple reason that they are completely
irrelevant to the researcher's definition of "publication." Publishers'
essential expenses, however, are not irrelevant; but in the open-access
era these amount exclusively to the peer-review implementation costs,
which are at most $500 per paper. How these essential costs are to be
recovered (if and when the self-archived open-access version of all
research papers ever reduces the market for the toll-access version so
it is no longer enough to cover the essential costs of peer review)
is a hypothetical question, for which there are hypothetical answers --
http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/399we152.htm
-- but it is
certainly not part of the definition of "publishing" either.)

http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0303.html
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1437.html

> Publishing on a web site is for most cases a non-risk
> activity, the web model has very little money attached to it,
> yet, except for those web activities such as online auctions or
> pornography where return on investment is important. The bursting of
> the .com bubble indicates that very little of the supposed return on
> web investment was profitable. So, OAI should be seen as a part of
> publishing where making a return is not (yet?) an objective, hence
> the interest of the computing community in it as an area of activity.

I am afraid that the foregoing passage is such a jumble of confusion
that nothing at all follows from it. Self-archiving refereed
publications is not publication, and neither pornography nor .com
bubbles are relevant. OAI has nothing to do with publishing, and the
computing community has no more (or less) to do with self-archiving than
the physics community or the classics community.

> The OAI has been up to now an extension of research activity, spurred
> by a perceived necessity to "get organised" to rationalise the access
> to electronically available material.

OAI has been actively involved in interoperability of digital archives.
BOAI has been actively involved in hastening the advent of open access
to refereed research output in all disciplines.

> The information professionals amongst us can be forgiven for a
> wry smile here. We have seen all this before, over centuries of
> development of library functions. From the early attempts to "stop the
> world" by trying to organise material under various classification
> schemes, Dewey, LC, UDC, etc., and then the AACR ? rules for
> cataloguing, to MARC, the use of computers to organise catalogues,
> through OPACS, user accessible online access to catalogues, etc.,
> etc., we have seen many actions. This is not to write these off, far
> from it, they still exist, and will continue to exist. What seems
> to have changed is the speed of development coupled with a belief
> that the recent developments in computing and networking make the
> solutions easier.

And the medium: digital and online. So I suggest that those information
professionals who have not yet understood its significance and function
give OAI a closer look. The concept of "interoperability" did not even
exist before the Internet (which is currently reputed to be about 20
years old).

> The advent of the computer, as I said in a previous Forum article
> (Information Retrieval, a story of research and other strange
> activities), created a belief amongst the computing profession that
> the problems of organising information and its access are over. Let
> us look at this in the light of the actions of the OAI.

No. The problem of organising information is not over, it is merely
transformed into the problem of organising digital information online,
i.e. interoperability. And please don't confuse organization with access
(i.e., toll-access vs. open-access).

> This initiative sees one of the keys to rational availability as
> common metadata. Again, the information professionals will smile,
> metadata is the new way of describing catalogue data, the essential
> elements necessary to identify uniquely a piece of writing, or audio
> or film or whatever. So we are back to AACR, etc. This time it is
> called Dublin Core...
> what is it? I may be wrong but at its essence it is
> an agreed minimum data set to enable an item to be identified.
> ICSTI Member OCLC... had a lot to do with its creation. So we
> have come a full circle so to speak, OCLC is the leading supplier
> of machine readable catalogue records so they have an interest in
> developing a core standard for computer readable descriptions.

And so do the rest of the creators and users and organizers of the growing
online digital corpus worldwide. OAI's concern is with making all of it
interoperable. BOAI's concern is with making the refereed-research subset
of it openly accessible. I do not see what point Barry is making here,
with this "we've seen all this before"...

> What doesn't seem clear to me is whether OAI and DC are working
> in harness. I assume that there are common memberships in various
> working activities, in fact we know there are, we heard about it
> at our Seminar in February on Digital Preservation, but are they
> working from the same specifications? As far as I can see OAI are
> developing computer tools to automate the creation of metadata from
> original material intended for web publishing while DC are working to
> make their "standard" - in inverted commas because it is not an ISO
> standard, the norm for metadata basics.

I leave it to Herb van de Sompel or Carl Lagoze of OAI to reply about
this.

> I may be wrong, I hope I am,
> but there doesn't seem to me to be much more than a common thread
> there, not to speak of a co-ordinated set of actions. The result
> should be coherent but there are other initiatives, for example, the
> LOM ? the Learning Object Metadata activity of the IEEE. LOM and DC
> have agreed to work together but on the horizon is the Semantic Web
> a concept of the inventor of the original web Tim Berners Lee which
> foresees a seamless interconnection between all information and
> activity on the Net which requires, you guessed it, an agreed set
> of descriptors, machine readable, so that computers will know the
> "meaning" of what they are doing. Enter another interest group,
> the W3C the World Wide Web Consortium, which is developing the
> standards for the ontologies necessary for the semantic web, the
> tools to enable descriptors to be uniquely referenced. Didn't I
> mention that concept before? ah yes, cataloguing......

Yes, interoperability means interoperability, and these various
initiatives will need to ensure that -- but what, again, is Barry's
point?

> It seems to me and again I hope I am wrong, that we have a number
> of activities taking place which impinge on the future of STI
> dissemination over which we the professionals of STI have little
> control. The OAI is understandably an important activity, concerned
> as it is with rationalising the production and access to electronic
> material, but while everyone appears to agree with the principles is
> the practise going to have the desired effect? We may be in danger
> of getting caught up in the speed of development, the advent of the
> Semantic Web concept may over-run the OAI.

I can't follow any of this. What is the point Barry is making? That
there may be competition about which interoperability protocol will
prevail? (Maybe one will win, maybe they will all be subsumed by
meta-interoperability among protocols.) That STI publishers have no say?
But surely they have at least as much say as any of the other players.

> open access

> Turning to the non capital letter open access initiative, it is much
> closer to us than OAI, because it affects the model of STI publishing
> which we have had for some time and which has served us well. There
> are two main issues with e-print and other forms of self publishing
> by scientists.

The first is that this is a non-issue, as open access and BOAI are not
about self-publishing, and never have been. See the URLs above.

> The first is the quality issue, non-refereeing, not
> having peer review, as a principle of open access. The advocates
> of self publishing argue that the availability of the material
> quicker and more widely means that, effectively, peer review is
> more rapid and may be more honest. The argument rages, there does
> not seem to be a solution in sight.

The open access movement (BOAI), to repeat, is about open access to the
peer-reviewed (i.e., published) research journal literature, not about
self-publishing.

(There do exist people who (1) oppose refereeing or (2) want it replaced
by post-hoc self-selected "peer" commentary and who (3) advocate
self-publishing. But that is not the open access movement or the BOAI,
any more than those who advocate abandoning grades, clothes, or private
property are representing the open access movement!)

http://www.princeton.edu/~harnad/nature2.html
http://www.eprints.org/self-faq/#7.Peer
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1169.html
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0479.html
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2340.html

> The debate is clouded by the
> increasing use of impact factors, based primarily on citation data,
> as measures of quality of research and even as measure of the quality
> of publications.

Which debate, and about what? The clouds come from conflating things like
(2) self-archiving and self-publishing or (2) freeing the peer-reviewed
literature from access-tolls and freeing it from peer-review.

Open access can strengthen and enrich scientometric analysis of research
impact (including citation impact), but the rationale for open access
and interoperability is not dependent on any particular scientometric
stance. Peer review is the primary measurer of research quality, as
reflected in the journal's established quality standards; citation and
usage in further research is a secondary measure, and so are subsequent
review and commentary articles. All of these benefit from open access
and interoperability. Research assessment in turn benefits from the
accessibility of a rich digital corpus for ever newer and more powerful
scientometric analyses. Where are the clouds?

> It is further clouded by the availability, quickly
> and electronically, of peer reviewed publications, the replication of
> the paper published process on computers.

Clouded? That sounds like a ray of sunshine to me!

> However, this latter has a
> significant side effect, it disrupts the traditional economic model
> of STI publishing. Therein lies the second main issue associated
> with the open access initiative.

First, where are the signs of this disruption? Second, should researchers
forego the obvious benefits of enhanced access and impact just in case
it might at some time disrupt the traditional economic model of STI
publishing? Is the objective of research to protect publisher revenues
or to generate research access/impact? And if there is a conflict of
interest, in which direction should it be resolved, in an era when open
access to all refereed research output is definitely feasible?

http://www.eprints.org/self-faq/#17.Publishers
http://www.eprints.org/self-faq/#publishers-do

> Much of the rhetoric surrounding the open access initiative is
> predicated on the argument that STI publishers have been making a
> lot of money from the work of scientists. There is some truth there,
> not that the word "lot" can easily be understood, one man's profit
> is another man's exploitation if one puts a sharp political point on
> it, but it is true, particularly in academic institutions, that the
> institutions were buying back from publishers the work of their own
> staff.

BOAI-1 (self-archiving) has nothing to do with publishers' profits. It
is a way a researcher can maximize the access to and the impact of his
own refereed research output. BOAI-2 (open-access journals) is normal
competition: If open-access-journal competitors to toll-access journals
manage both to cut costs so as to make open-access possible and to
capture the authorship that finds that attractive, is there anything
wrong about that?

(But the refrain that libraries buy back the research their own
researchers have given away is and has always been nonsense -- or rather
a very ill-expressed, because ill-conceived, formulation of what is
a genuine anomaly: Institutions' libraries don't buy back their *own*
institutional give-away research! (They have that already, in-house, for
in-house use.) They buy *in* *other* institutions' give-away research. Put
that way, it becomes quite transparent that the reciprocal remedy is
for all institutions to self-archive their own refereed research output,
thereby making it openly accessible to one another ["Self-archive unto
others as ye would have them self-archive unto you.]. *Then* toll-based
buy-in will no longer be necessary.)

> This is a rather simplistic view, not every library bought
> their own work, much of the advantage of the publishing process
> was in ensuring that the dissemination was wide.

I couldn't follow this. The problem is and always was that access to the
refereed research literature -- always an author/institution give-away
-- is only possible through tolls. Yes, paper-publication was a great
improvement over the oral tradition, but its real costs precluded open
access. The new medium has now made that possible. So what are we waiting
for? (The rest is just Zeno's Paradox.)

http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm#8

> Long before the
> recent initiatives the practise of private distribution of not for
> attribution papers was common, yet scientist read the journals because
> there was other material there and the format was convenient. The
> advent of the Net made the 'pre-print' process even easier and hence,
> I suppose, the idea of doing the whole process via the Net.

That was the way the token dropped for the physicists, but now the cat's
out of the bag and it is clear that self-archiving not only gets
pre-refereeing research out earlier (caveat emptor!) but it also
maximizes the impact of peer-reviewed final drafts.

> The publishers argue that their role has not changed, they ensure
> quality, rationality in the sense of coherent vehicles for subjects
> and convenience.

Fine. Let them compete with the author-supplied self-archived versions
in their institutional Eprints Archives, and if the toll-access version
still offers added value, it will continue to be paid for by those who
can afford it.

> The other aspect of the economic issue is the role of the professional
> bodies. They were, in many cases, formed to be publishing vehicles,
> that continues to be their raison d'etre in many cases. Their argument
> is that the income from publishing goes back in to research so where
> is the harm?

Are you saying that learned societies that are also journal publishers
are using their revenues to fund research? That's the first I've heard
of that! But even if it is true, once the causal connection between access
and impact is demonstrated to researchers, I greatly doubt they will
knowingly elect to continue paying for someone else's research funding
with their own lost research impact!

> Good argument, but in many cases it is the same money going around
> and around. The researcher publishes through his or her professional
> body, the library of the researchers institution, constantly under
> economic pressures, buys the journal in paper or electronic form
> or both, and makes it available to the researcher who looks to
> the administration of the institution to make more money available
> for research or facilities for research and argues for cuts in the
> library budget but at the same time looks to his or her professional
> body to supply research money..... and so it goes.

This needlessly complicates the causal sequence, which is this: The
research is funded by research funds and university salaries. The paper
is given (free) to the publisher, who has it peer-reviewed (the referees
referee for free) and then sells access for a fee, which is likewise
paid by the university. The average access-revenue per published article
(paid, collectively, by those institutions who can afford access to that
particular journal) is $2000. (Please tell me how much of that, if any,
is going back to fund research -- and whose research?) The maximum
estimated cost of peer review per paper is $500.

Here again are the figures from the top of this posting:

Using the figures provided by David Goodman in this Forum recently:
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2534.html

Here is what it would cost and how much it would save if universities paid
only the annual peer-review costs for their own outgoing (self-archived)
research, instead of the annual toll-access fees to buy in other
universities outgoing research.

For 2001 (the latest year available):

---------- annual current annual $500 per article annual
UNIVERSITY article incoming serials peer review percentage
---------- output toll-budget only savings

Cornell 4848 $5.6 million $2.43 million 57%
Dartmouth 1492 $3.2 million $0.75 million 77%
Princeton 3132 $4.7 million $1.57 million 66%
Yale 4463 $6.4 million $2.23 million 64%

Throw in $10 per paper annual archiving costs and allow +/-50% for error
and it still looks as if open access would bring all its other benefits
(in terms of impact and access) at considerably less cost.

> All of this is complicated by the introduction of new distribution
> models, the site license model being the most common, where the
> institution buys the access to the material, more and more in the
> electronic form, for the whole campus, or increasingly through a
> consortium agreement for a whole series of institutions and in return
> has to provide readership information, another bone of contention.

Subscriptions, site-licenses and pay-per-view are all equivalent, and
merely variant forms of toll-access costs. It is of no interest which
way the toll budget is labelled.

> Oh dear, it seems like we are never to escape from the curse of
> progress ---e live in interesting times. As I said a the beginning
> I don't advocate any policy for ICSTI in these areas, I just feel
> we need to keep a very close watch on what is happening in an every
> widening circle of activities.

The conclusions to be drawn from all this are staring us in the face.
I hope that Barry will see them more clearly now...

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02):

    http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
                            or
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/index.html

Discussion can be posted to: american-scientist-open-access-forum_at_amsci.org

See also the Budapest Open Access Initiative:
    http://www.soros.org/openaccess

the Free Online Scholarship Movement:
    http://www.earlham.edu/~peters/fos/timeline.htm

the OAI site:
    http://www.openarchives.org

and the free OAI institutional archiving software site:
    http://www.eprints.org/
Received on Thu Jan 09 2003 - 21:52:48 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:48 GMT