Hi,
As part of the OpenDOAR project we've been looking at OA repositories' policies, and trying to harvest the details using the OAI-PMH 'Identify' verb. We did this for all the repositories listed in OpenDOAR (www.opendoar.org) that had an OAI Base URL. We had problems harvesting about 9% of repositories, for one reason or another. Two notable cases were ArXiv, which actively excludes requests from robots (unless they are on their white list), and Digital Commons repositories, where OAI-PMH requests are deliberately 'throttled' - taking ten times longer to respond than other sites. In both cases, this has been done to ensure that interactive end-user sessions are not degraded if robots attempt to crawl the entire repository (as they are wont to do).
It's also worth pointing out that only a third of OA repositories seem to have declared policies on re-use of metadata, full items, etc - or at least policies that are retrievable via OAI-PMH.
I gave a presentation on this exercise in a workshop at the recent CRIS 2006 conference in Bergen. Further details at
http://www.sherpa.ac.uk/documents/BergenPresentation20060512Handouts.ppt
Dr. Peter Millington
SHERPA Technical Development Officer
Information Services
George Green Library
University of Nottingham
University Park
Nottingham, NG7 2RD
England
Phone: +44 (0)115 84 68481
FAX: +44 (0)115 84 68244
-----Original Message-----
From: American Scientist Open Access Forum [mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG] On Behalf Of David Goodman
Sent: 08 June 2006 20:08
To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
Subject: Re: RE : Harold Varmus: "Self-Archiving is Not Open Access"
I ask for information about the accessibility of documents maintained on individual faculty home pages, both to OAI-PMH and to such engines as google and scirus. (I do not refer to the policy of a crawler to examine or not examine a site, but to policies set by the site, to keep such crawlers out, presumably as an attempt to reduce the spam received by the faculty.)
I have sometimes found such sites inaccessible, though the document was present and could be found manually once that home page was located. I do not know whether this is frequent, or rare, as, I have not kept a tally, but this is the list to look for anyone who has been doing so.
Especially because some supposedly green OA journals restrict the author's posting to such an individual site, It would be good to know whether this is indeed
>a minor problem.
Dr. David Goodman
Palmer School of Library and Information Science Long Island University and formerly Princeton University Library
dgoodman_at_princeton.edu
----- Original Message -----
From: Guédon Jean-Claude <jean.claude.guedon_at_UMONTREAL.CA>
Date: Wednesday, June 7, 2006 9:35 pm
Subject: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] RE : Harold Varmus: "Self-Archiving is Not Open Access"
To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
...
>
> This said, if people were to self-archive in small, isolated, web
> sites that do not obey OAI-PMH, it would be pretty invisible and, as
> such, may not really qualify as OA. In this situation, venue could
> indeed be relevant, but only in that situation.
....
> And it is a
> very minor problem anyway.
>
This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
Received on Wed Jun 14 2006 - 05:12:15 BST