Re: Scientometric OAI Search Engines

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Fri, 16 Dec 2005 22:14:20 +0000

I am re-posting Rick Anderson's message in its entirety, for two reasons:

(1) This is a moderated list, and topic threads, some running back as
far as 7 years, are organized by content, not contributor caprice.

(2) To minimise postings that just rehearse old issues and old
misunderstandings that have already been aired many times on this list,
and tend to drive people off the list. (AmSci is a Forum for concrete
policy discussion now.]

My REPLY follows Rick's message below. -- SH

Subject: Reality, OAIster and Citebase (RE: Re: Scientometric OAI Search Engines)
Date: Fri, 16 Dec 2005 09:31:33 -0800
From: "Rick Anderson" <rickand_at_unr.edu>
To: <AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG>

In response to my observation that self-archived content isn't always
easily accessible (or accessible at all) through the usual search
engines, Stevan issued a perfectly reasonable challenge:

> For a preview of reality with 100% self-archiving, see OAIster or
> citebase:
>
> http://oaister.umdl.umich.edu/o/oaister/
> http://www.citebase.org/

So in fairness, I decided to replicate my original experiment (in which
I used Google to try to locate the self-archived copies of the six
articles published in the September, 2004 issue of Journal of Economic
Literature) using OAIster and citebase -- arguably better tools than
Google for finding self-archived OA content. I resolved beforehand to
report my results here whether or not they supported my original
argument.

To quickly recap my original experiment: when I used Google to search
for the self-archived versions of those six articles, I found four of
them. Of those four, I found two easily and two only after fairly
determined effort and multiple search strategies. The remaining two I
was unable to find at all.

This morning, I searched OAIster for those same six articles, and found
two of them. So then I tried citebase. There I found none of them.

Now, the glib response would be to say that if this is "a preview of
reality with 100% self-archiving," then you can have it. But that
wouldn't be entirely fair. These tools will surely improve over time --
especially citebase, which is still in development. However, I strongly
suspect that part of the problem we're seeing here is not a failure of
the search engines, but a failure of the authors, who probably simply
haven't gotten around to placing their articles in a searchable archive.
The best search engine in the world is never going to find an article
that isn't there. Not to get all semiotic again, but it matters very
much what we mean when we say "100% self-archiving." Do we mean a
situation in which 100% of publishers allow self-archiving, or one in
which every author really does faithfully archive her work? The former
situation is unlikely, but possible. The latter situation is possible,
but seems to me extremely unlikely, given what we know of human nature.

Rick Anderson
Dir. of Resource Acquisition
University of Nevada, Reno Libraries
(775) 784-6500 x273
rickand_at_unr.edu
---------------------------------------------------------------------

REPLY:

On Fri, 16 Dec 2005, Rick Andreson wrote:

> In response to my observation that self-archived content isn't always
> easily accessible (or accessible at all) through the usual search
> engines, Stevan issued a perfectly reasonable challenge:
>
> > For a preview of reality with 100% self-archiving, see OAIster or
> > citebase:
> >
> > http://oaister.umdl.umich.edu/o/oaister/
> > http://www.citebase.org/
>
> So in fairness, I decided to replicate my original experiment (in which
> I used Google to try to locate the self-archived copies of the six
> articles published in the September, 2004 issue of Journal of Economic
> Literature) using OAIster and citebase -- arguably better tools than
> Google for finding self-archived OA content. I resolved beforehand to
> report my results here whether or not they supported my original
> argument.

Rick's original posting on this had said:

    "I went to the website of the Journal of Economic Literature, a
    certified "green" publication. I found the table of contents for
    the September 2004 issue, and saw that it contained six articles. I
    did one or more Google searches for each article... I found four in
    publicly-available online archives... two easily, one with moderate
    difficulty and one only after employing multiple search strategies
    and poking around in a fairly determined manner. Two I was unable
    to find at all."
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4681.html

(1) How is looking for these articles in OAIster or citebase a preview
of the reality of 100% self-archiving? OAIster and citebase only cover
about 15%; it is their search power *within* that 15% that can be sampled
as a taste of what search would be like with 100% self-archiving.

(2) JEC is a green journal (i.e., it give its authors the green light to
self-archive). But obviously the journal's green light isn't enough,
since 93% of journals are green but only 15% of articles are being
self-archived. (That's why a self-archiving mandate like the one proposed
by RCUK and Berlin 3 is needed.)

(3) Four out of the six articles in JEC September 2004 were on the web.
That's 67%, much better than the 15% average. But the 15% estimate
is based on what google *does* find today (and there the 2/6 Rick
found with google comes closer to the average).

(4) Self-archiving mandates require self-archiving in an OAI-compliant
Institutional (or Central) Repository (Archive) -- not just on an
arbitrary website. All OAI-compliant archives will be harvested by
OAIster (and probably also google scholar).

(5) For 100% OA self-archiving, read "100% OA self-archiving in OAI
IRs or CRs." (OAI-compliance was already part of the 2001 BOAI
definition of OA!)

    http://www.openarchives.org/
    http://www.soros.org/openaccess/read.shtml [see: I. Self-Archiving]

So what *is* Rick's point?

> To quickly recap my original experiment: when I used Google to search
> for the self-archived versions of those six articles, I found four of
> them. Of those four, I found two easily and two only after fairly
> determined effort and multiple search strategies. The remaining two I
> was unable to find at all.

And your point is?

> This morning, I searched OAIster for those same six articles, and found
> two of them. So then I tried citebase. There I found none of them.

And your point is?

> Now, the glib response would be to say that if this is "a preview of
> reality with 100% self-archiving," then you can have it.

That would not be the glib response, but the obtuse one, since I obviously didn't
mean that OAIster has the 85% of articles that are not self-archived yet (how
could it)? I just meant that OAIster's search capabilities (over the 15%
OA/OAI content that exists so far) give a preview of what search will be
like over 100% OA/OAI content.

> But that
> wouldn't be entirely fair. These tools will surely improve over time --
> especially citebase, which is still in development. However, I strongly
> suspect that part of the problem we're seeing here is not a failure of
> the search engines, but a failure of the authors, who probably simply
> haven't gotten around to placing their articles in a searchable archive.

What is remarkable is what Rick "strongly suspects is part of the problem"
*is* the problem, the *whole* problem, and that was dead-obvious from
beginning, and was already explicitly stated twice: The fault is not with
the search engines, but with the authors who have not yet self-archived
85% of articles!

Why did we need the suerfluous search engine exercise to arrive at this
"suspicion"?

> The best search engine in the world is never going to find an article
> that isn't there. Not to get all semiotic again, but it matters very
> much what we mean when we say "100% self-archiving." Do we mean a
> situation in which 100% of publishers allow self-archiving, or one in
> which every author really does faithfully archive her work?

Of course 100% self-archiving means 100% self-archiving by the authors
(the "self" in the "self-archiving")!

The figure with which Rick was conflating this was the percentage of
journals that are green on author self-archiving (93%). But obviously
giving someone the green light does not necessary mean he will *go*!

> The former situation is unlikely, but possible.

100% green journals is unlikely, when we already have 93% green journals?

    http://romeo.eprints.org/stats.php

> The latter situation is possible, but seems to me extremely unlikely,
> given what we know of human nature.

100% self-archiving is certainly unlikely --given that we only have 15%
spontaneous self-archiving even though 100% has been possible for at least
15 years, and the demonstrations of its dramatic impact-benefits have
been accumulating in discipline after discipline for several years now.

    http://opcit.eprints.org/oacitation-biblio.html

And most authors have as much as said that they won't self-archive --
until/unless their institutions and/or their research funders *require*
self-archiving, in which case 95% report they will comply (81% willingly,
14% reluctantly):

    http://eprints.ecs.soton.ac.uk/11006/

That's why the UK Select Committee, Berlin 3 and the RCUK proposed
requiring immediate self-archiving. And that's why the 5 institutions
that already have an immediate self-archiving mandate (4 universities
plus CERN) are well on their way toward 100% self-archiving.

We mustn't be defeatist about human nature: we need to adopt policies
that hel researchers help themselves to the benefits of OA. (*None*
of this has anything to do with search engines!)

Stevan Harnad

AMERICAN SCIENTIST OPEN ACCESS FORUM:
A complete Hypermail archive of the ongoing discussion of providing
open access to the peer-reviewed research literature online (1998-2005)
is available at:
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/
        To join or leave the Forum or change your subscription address:
http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
        Post discussion to:
        american-scientist-open-access-forum_at_amsci.org

UNIVERSITIES: If you have adopted or plan to adopt an institutional
policy of providing Open Access to your own research article output,
please describe your policy at:
        http://www.eprints.org/signup/sign.php

UNIFIED DUAL OPEN-ACCESS-PROVISION POLICY:
    BOAI-1 ("green"): Publish your article in a suitable toll-access journal
            http://romeo.eprints.org/
OR
    BOAI-2 ("gold"): Publish your article in a open-access journal if/when
            a suitable one exists.
            http://www.doaj.org/
AND
    in BOTH cases self-archive a supplementary version of your article
            in your institutional repository.
            http://www.eprints.org/self-faq/
            http://archives.eprints.org/
            http://openaccess.eprints.org/
Received on Fri Dec 16 2005 - 22:18:39 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:09 GMT