Re: Free Access vs. Open Access from Steve Hitchcock on 2003-08-12 (American-Scientist-Open-Access-Forum)

From: Steve Hitchcock <sh94r_at_ecs.soton.ac.uk>
Date: Tue, 12 Aug 2003 18:48:06 +0100

This debate between Stevan Harnad and Matthew Cockerill about what
constitutes 'open access' appears to resolve to whether or not a full-text
document has a machine interface to the full text, for datamining purposes,
as well as a user interface. In the absence of evidence of gerrymandered
free access preventing e.g. download, save, grep, or print-off, Harnad is
happy to accept free as a sufficient criterion.

There is another view of 'free', not untypical of publishers, a version of
which was recently expressed by Michael J Held of Rockefeller University Press
http://www.jcb.org/cgi/content/full/jcb.200307018v1
According to Held, while free access to information is 'powerful and
alluring', the open access publishing model is 'unproven' and possibly
'unsustainable'. Here free refers to back content (PubMed Central and
Highwire models) and regional access (defined by the World Health
Organization as developing nations).

Separately, and not uncoincidentally, Peter Suber compares definitions of
open access from some of the major open access initiatives
https://mx2.arl.org/Lists/SPARC-OANews/Message/96.html
To me this is the most notable statement:
"The most important element by far is that open-access literature is
available online free of charge. This is the element that catalyzed the
open-access movement, and the element that defined "free online
scholarship". To this day, it's the only element mentioned when journalists
don't have space for a full story."

But how can this version be differentiated from Held's version of free?
With the addition of the words 'immediate (upon initial publication)' and
'universal' to that of 'free' in the conditions of open access. (Peter goes
on, legitimately, to note some additional access barriers that might make
him reluctant to adopt 'universal' here, but only one of these - Handicap
access barriers - is within the remit of open access content producers.)

Some wish to go further, but surely these are the only terms that are
necessary in a definition of open access. Alone, free is insufficient.

Steve Hitchcock
IAM Group, Department of Electronics and Computer Science
University of Southampton SO17 1BJ, UK
Email: sh94r_at_ecs.soton.ac.uk
Tel: +44 (0)23 8059 3256 Fax: +44 (0)23 8059 2865

At 00:10 12/08/03 +0100, Stevan Harnad wrote:
>On Mon, 11 Aug 2003, Matthew Cockerill wrote:
>
> >sh> "The use one makes of those full texts is to read them,
> >sh> print them off, quote/comment them, cite them, and use
> >sh> their *contents* in further research, building on them.
> >sh> What is "re-use"? And what is "redistribution" (when
> >sh> everyone on the planet with access to the web has access
> >sh> to the full-text of every such article)?"
> >
> > Having free access to articles on the publisher's website would certainly
> > offer progress compared to the current status quo. But it would not offer
> > anything like the benefits of true open access.
>
>Free access to the current 20,000 journals (2 million articles yearly)
>would be like the difference between night and day. Compared to that,
>the difference between "free" and "true open" access amounts to just a
>few degrees of luminosity.
>
>But let me agree at once that if free access were gerrymandered so
>all the user could do was to browse the text on-screen, without being
>able to download, save, grep, or print-off, then that would indeed
>arbitrarily limit free access's usefulness. How many (if any) of the
>several million free-access refereed-journal articles currently on the
>web, however -- whether BOAI-1, BOAI-2, or otherwise -- are gerrymandered
>in that way? If (as I suspect) the answer is "very few" or even "none
>that I know of," then this hypothetical constraint is not worth another
>moment's thought or energy diverted from the real task at hand, which
>is to turn night into day, as soon as possible.
>
> > Here are just some of the
> > reasons why re-use and re-distribution rights are vital to open access:
> >
> > (1) Digital permanence - it is not enough for the publisher to be the only
> > body which curates the full archive of published research content. To
> ensure
> > long term digital permanence of the scientific record, it is vital that
> > articles should be deposited with multiple archives, and redistributable
> > from and between those archives.
>
>It seems to me that this is conflating (arbitrarily) two completely
>independent matters. One is toll-free online *access* to the articles
>in the 20K journals that are currently only accessible via tolls. The
>other is the *preservation* of that toll-based corpus.
>
>Well, preservation of that toll-based corpus was always a concern, in
>on-paper days as in on-line days, and the concern has nothing whatsoever
>to do with free (or open) access! We could have a failsafe preservation
>system without free access, or we could have a failsafe preservation
>with free access; or we could have an uncertain preservation system
>without free access (as we do now) or an uncertain preservation system
>with free access (bringing the present system out into the light of
>day).
>
>The preservation burden has to be (and will be, and is being) faced in
>any case. Why on earth should that entirely orthogonal longterm
>task be coupled in *any way* to the immediate and urgent problem of free
>access today? And why should "open access" be linked with or defined in
>terms of the eventual solution to the preservation problem, one way or
>the other? (This is not an argument for indifference to preservation: it
>is an argument for decoupling two completely independent desiderata.)
>
> > (2) A flexible choice of tools for searching and browsing
> > The reason that Google exists is because the web is free for anyone to
> > download and index. As a result, there is competition among search engines,
> > and Google had the incentive to develop a better system for indexing web
> > pages, which has since driven other search engine companies to improve the
> > tools they offer.
> >
> > Compare this with the situation with scientific research. If the research
> > resides only on the publisher's site, you don't have a free choice of what
> > tools you use to search and browse it - you are stuck with what that
> > particular publisher provides you with.
>
>We are quite squarely in the domain of hypotheticals here. (Which
>publisher's free-access corpus, inaccessible to google, are we talking
>about?) But let us suppose that a publisher provides free access --
>not gerrymandered free access, but free access that allows downloading,
>saving, grepping and printing:
>
>First, I will bet that such a publisher will want to maximize the
>visibility and impact of his contents by allowing at least the indexing
>metadata to be harvested, both by google, and by the OAI search engines
>specializing in the refereed journal literature.
>
>But even if we get doubly hypothetical here, and suppose the publisher
>does *not* disclose the metadata to harvesters, there is
>still a super-simple solution: Every author has an online
>CV. Their CV will contain the metadata for every one of their
>journal publications. (Such CVs can and will be OAI-compliant:
>http://paracite.eprints.org/cgi-bin/rae_front.cgi ).
>Add the URL for the free-access full-text on the publisher's website to
>your CV entry and the circle is closed. (Better still, also self-archive
>the full text in your own institutional OAI-compliant repository!)
>End of story.
>
> > This ties in with developments in Grid computing (e.g.
> > http://www.escience-grid.org.uk/ ). With open access, published research
> > would be available "on tap" via the grid, and scientists would be able to
> > use their preferred choice of grid tools to access the data, rather than
> > being stuck with the tools provided by the publisher.
>
>As stated above, the CV/OAI gambit above already trivially takes care of
>closing the circle.
>
>I agree, though, that for many research purposes, it is beneficial to
>have not just the metadata but the full-text inverted and indexed, as
>well as agent-harvestable and. Again, if the publisher's free-access site
>doesn't do this, the author's institutional site certainly can and will.
>In fact, authors and their institutions are the ones with the most
>direct interest in making sure their own research output is maximally
>usable in this way.
>http://www.ecs.soton.ac.uk/~harnad/Temp/unto-others.html
>
>Let us not, however, conflate article-text archiving with
>data-archiving. Data-archiving is important too, but it is an extra:
>an independent new bonus of the online era, having nothing to do with
>the question of toll-free access to article-texts. In the paper era, raw
>data were not published, just summarized in what was published. Eventually
>data will no doubt be incorporated into online publications in some way,
>but until then there is certainly no need for authors to wait! They
>can publish their article, as before, and, in addition, self-archive
>the data on which their article is based in their own OAI-compliant
>institutional research repository (the same repository in which
>the full-text of their article can and should be self-archived too,
>whether it appears in an open-access journal, a toll-access journal, or a
>toll-access journal that offers toll-free access too). Again, the online
>CV can close the circle, if it is not already closed of its own accord.
>
>And this way, although it is functionally independent, data-archiving
>can help speed the progress toward toll-free full-text access too.
>
> > (3) Datamining
> >
> > With a million or so biomedical research articles being published each
> year,
> > the sheer volume of output is an obstacle to the comprehension and
> synthesis
> > of the results reported in that research. If the XML of the articles can be
> > brought together in one place then the tools of datamining can be
> applied to
> > it to extract useful but non-obvious information.
>
>Agreed. See above. But before we get carried away with the potential
>perks, let's not forget the still absent basics: Let there be Light
>(toll-free full-text access), now! Leave the Solar-Energy and Club-Med
>projects for when we already have our daily fill of photons.
>
> > The simplest type of datamining is citation analysis
> >
> > Currently you need to pay ISI a lot of money to find out what cites what,
> > but with true open access, citation analysis becomes trivial.
>
>Perhaps not quite trivial. (There's still the problem of parsing,
>identifying and linking the citations for all those articles without the
>ultimate mark-up: But we're working on it: http://opcit.eprints.org/ ).
>
>But again, this is an independent perk, because you could have universal
>citation linking and analysis even *without* toll-free full-text access!
>For an article's reference list, like its indexing metadata (and its
>accompanying empirical data) can all be self-archived by the author (guess
>where?). We are in fact promoting this solution for royalty-based books,
>whose authors, unlike journal article-authors, are unlikely to want to
>make their full-texts accessible toll-free. Their metadata and reference
>lists, however, are another matter, and can (and will) be tucked into
>the institutional OAI-compliant repository too, with a new indicator of
>global book citation impact as the harvestable reward.
>http://www.ariadne.ac.uk/issue35/harnad/
>
> > So, for example, if you view a PubMed record:
> >
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_ui
> > ds=11667947&dopt=Abstract
> > you already get links to all the full text articles in PubMed Central which
> > cite that PubMed item
> >
> http://www.pubmedcentral.gov/tocrender.fcgi?action=cited&tool=pubmed&pubmedi
> > d=11667947
>
>And if you look at citebase, you will see how this generalizes to the
>entire OAI-compliant literature:
>http://citebase.eprints.org/cgi-bin/search
>
> > The more true open access research that is published and archived at PubMed
> > Central, the more useful this becomes for biomedical researchers. [Sure,
> > "screen-scaping" HTML from free articles displayed on publisher sites could
> > give some citation information, but with nothing like the ease,
> accuracy and
> > reliability that it can be obtained with the use of XML data, as at PubMed
> > Central].
>
>Fine. But I'd rather have toll-free access to all 20K journals right
>now, rather than waiting for these XML perks -- wouldn't you?
>
>Again, toll-free access is one thing -- and extremely important,
>already reachable, and already overdue -- and potential perks such as
>citation-based navigation are another. Let there be light first; then we
>can worry about calibrating the photometers on our Yashicas.
>
> > Beyond citation analysis, there are many other forms of datamining that are
> > possible:
> > For more information see:
> > http://www.biomedcentral.com/info/about/datamining/
> >
> > e.g. Research articles can be mined for details of protein interactions
> > http://bioinfo.mshri.on.ca/prebind/
>
>See above. Right now, it is an indisputable fact that open-access
>publishing today (BOAI-2) is the solution only for that 5% of the literature
>(of 20K journals) that has a suitable open-access journal today. The
>immediate solution for all the rest is self-archiving (BOAI-1), rather
>than continuing to wait for more open-access journals to spawn and grow.
>
>(If, in the meanwhile, toll-access publishers also want to help hasten
>things along by providing free access, they are certainly welcome
>to do so! I still regret -- for the sake of open access --
>that the BOAI http://www.soros.org/openaccess/sign2.shtml?o was
>not ready to count it as publisher support of open access if a
>toll-access journal supported author self-archiving of their articles
>http://www.ecs.soton.ac.uk/~harnad/Temp/rcoptable.gif: *Of course* that
>is publisher support for open access! By the same token, I would certainly
>consider it as publisher support for open access if a toll-access journal
>made its full-text contents publicly accessible online toll-free. Even if
>it was gerrymandered full-text access -- as long as they also supported
>self-archiving!)
>
> > And as scientific content is increasingly marked up using richer forms of
> > semantically meaningful XML (e.g. CML for chemical structures, MathML for
> > equations), the value of datamining will continue to increase.
>
>All true. And it will all prevail eventually. But we need free access
>*now*. http://www.ecs.soton.ac.uk/~harnad/Temp/che.htm
>
> > The BioLINK group are using BioMed Central's open access corpus as the raw
> > material for a datamining competition, designed to stimulate progress
> in the
> > development of tools for biological datamining.
> > http://www.pdg.cnb.uam.es/BioLINK/BioCreative_task2.html
>
>That is commendable and welcome. But it must not be forgotten what
>percentage of the annual biological journal literature that sample
>actually represents. We must not be held back to that small percentage
>because we are informed that mere free access is not good enough -- not
>"true open access." Such rarefied fussiness does not serve the cause of
>either free or open access at this point.
>
> > (4) Derivative works and compilations
> > Say that a scientist performs a meta-analysis on a group of published
> > clinical trials, and wants to make available the conclusions of that
> > research. Or perhaps a datamining researcher has taken a corpus of 1000
> > articles breast cancer, and established some interesting conclusions.
>
>All very welcome and valuable (indeed, inevitable) developments in the
>online age. But I'd rather that progress toward free access for all 20K
>did not wait for these perks. Indeed, the sooner we have free access,
>the sooner the rest will come too.
>
> > In a true open access environment, each is free to post the results of
> their
> > research, *along with* the actual corpus of data which the research was
> > based on (effectively, the raw data for that research).
> > But in a non-open access environment, that raw data (i.e. the research
> > articles) cannot be redistributed, which makes it far more difficult
> than it
> > needs to be for other scientists to reproduce, critique and follow up the
> > work.
>
>I am afraid I have to disagree. As already noted above, authors are as
>free to self-archive (in their institutional repositories) the empirical
>data underlying their toll-access publications as they are to do so with
>the data underlying their open-access publications. Data-archiving is
>another thing for which there is no point sitting around awaiting the
>era of universal open-access publishing. Data-archiving will encourage
>article self-archiving, and both will hasten the era of universal
>open-access.
>
> > Similarly, a scientist may wish to make a point by assembling a collection
> > of certain articles or article fragments (perhaps they wish to assemble a
> > comparison of the methods used for a certain technique).
> > In an open access world, as long as they cite the sources, they are
> > completely free to create and redistribute that compilation. Such a
> > selective compilation may in itself be extremely useful contribution to
> > science.
>
>I can't follow this at all. A compilation is a list of articles, whether
>online or on-paper, whether toll-access of open-access. If the
>full-texts of the texts are *free* access, all the compilation need list
>is their URLs. (Ditto for article "fragments": try section number,
>paragraph number, or even [yech!] PDF page number.)
>
> > (5) Print redistribution rights - the National Health Service, for example,
> > should be able to redistribute thousands of printed copies of an important
> > research article (which it may have funded) to its doctors if it wishes to
> > do so. It should not have to pay a hefty copyright fee for the privilege.
>
>I have no views on this, but it has nothing to do with open access,
>which even in the strict BOAI definition refers to online access, not
>to multiple printing and redistribution rights. Besides, this is all
>becoming moot in the online era: Why distribute print copies instead of
>URLs, if the texts are publicly accessible online toll-free?
>
>(I think it is a big mistake, and clouds the issue, to try to link online
>toll-free access arguments with paper-printing rights. Don't forget that
>those worthy paper-based arguments would have been just as worthy in the
>paper era. So surely they are *not* what has changed in the online era.)
>
> > Certainly, print redistribution will likely become less significant in the
> > future, but there is no logical reason that the scientific community should
> > not be free to exchange and distribute the research that it has created in
> > print form, as well as online.
>
>The case for multiple printing rights is *much* weaker than the case
>for toll-free online access. Please let us not needlessly weaken
>the case for free access by handicapping it with such needless extra
>burdens. Free access will erode the need to print, even as it erodes
>publisher opposition to printing. But now, all fussing about print
>"redistribution" rights does is provoke needless opposition, to no
>good purpose. Keep it light, till everyone sees the light.
>
>Stevan Harnad
>
>NOTE: A complete archive of the ongoing discussion of providing open
>access to the peer-reviewed research literature online is available at
>the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03):
>
> http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
> or
> http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/index.html
>
>Discussion can be posted to: american-scientist-open-access-forum_at_amsci.org
Received on Tue Aug 12 2003 - 18:48:06 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:02 GMT