Please Don't Conflate Direct with Harvested CRs (Central Repositories), Or Deposit Locus With Search Locus
On Sun, Feb 8, 2009 at 12:27 PM, Imre Simon <is_at_ime.usp.br> wrote:
"It is an unquestionable reality that unmandated IR's
[Institutional Repositories] remain all but empty. ArXiv,
CiteSeerX, Repec and SSRN are the four examples of large
thematic repositories [Central Repositories, CRs] I know
of which are populated without a mandate. One wonders
why?"
(1) There is a profound difference between (1a) Arxiv (and perhaps
also SSRN ), on the one hand (these are Central Repositories [CRs] in
which authors deposit papers directly) and (1b) CiteseerX (and
partly also Repec), on the other hand (for these are harvested CRs,
their papers and metadata harvested from local repositories, usually
at the author's host institution, where they have been directly
deposit). Harvested CRs are like OAIster , or, for that matter,
Google Scholar!
(2) The difference is crucial, because central vs. institutional
locus-of-deposit is what is really under discussion here; no one is
disputing that navigation and search are done, and should be done, at
the central level, irrespective of whether CR deposit is direct or CR
contents are harvested.
(3) There are several reasons why these particular CRs are fuller
than IRs:
(3a) An entire discipline is bigger than a single
(multidisciplinary but local) institution
(3b) These CRs contain only the deposits of those
individual authors and disciplines that do deposit
spontaneously, unmandated; these amount to about 15% of
OA's total target output, and that is well known. The
problem is the remaining 85% -- which will be pretty
homogeneously represented in each individual
multidisciplinary institution's IR (85% empty if
unmandated).
(3c) But there is a systematic denominator bias here, for
the success of an IR in capturing its institutional
research output is reckoned as the ratio of its annual
deposited papers to the total annual paper output for
that institution, whereas for a CR this must be reckoned
as the ratio of its annual deposited papers to the total
annual output for the discipline or disciplines the CR
covers (worldwide)! For certain disciplines and
subdisciplines, such as High Energy Physics,
Astrophysics, Economics and Computer Science this ratio
will be quite high. But they are not OA's problem
disciplines, because they are depositing already, whether
centrally or locally, unmandated, and have been doing so
for years. OA's problem is all the disciplines that are
not doing so, for they are the main basis of the 85%
emptiness of IRs.
(4) The reason all this matters, and the reason it is so important
not conflate direct and harvested CRs, nor to conflate deposit locus
with search locus, is that the issue of locus-of-deposit and mandates
is very deeply interrelated.
(5) Deposit mandates can be funder mandates or institutional
mandates.
(6) Funder mandates only cover funded research, and not all (perhaps
not even most) research output is funded; and this would be true even
if all funders already mandated OA.
(7) In contrast, (virtually) all research output (and hence all of
OA's target content) is institutional. Institutions are the universal
research providers.
(8) So if all institutions mandated OA, that would generate universal
OA.
(9) Hence if all of OA's target content is institutional output, it
follows that, inasmuch as the 85% of research that is not being
deposited spontaneously will be deposited once it is mandated, what
is most needed is universal institutional OA mandates.
(10) Funder mandates already help, for their portion of OA's target
content, but they would help far more if they could facilitate the
deposit not only of the research they fund, but all research: in
other words, if they could help induce institutions to mandate OA for
all of their research output, not just the subset mandated by the
funder.
(11) In order to be able to do this, funder mandates need only ensure
the presence of one implementational detail, which does not lose any
of their own target content, but potentially extends also to the rest
of the research output of each one of its fundees' institutions.
(12) Funders need to stipulate the fundee's own IR as the
locus-of-deposit for complying with the funder's deposit mandate (or
an interim backup repository like DEPOT, to host deposits until the
institution sets up an IR, to which the deposits can then be
automatically exported: DEPOT currently has only 66 deposits because
most UK funders are either requiring CR deposit or leaving it open
which repository their fundees deposit in).
(13) The contents can be harvested to CRs from there.
(14) The issue of search and functionality at the harvester level is
nothing but a red herring. (Citeseerx is a perfect example of the
functionality of a CR that harvests from distributed IRs.)
(15) Nor do the special features of the few disciplines (such as
computer science -- the first -- physics and economics) that took
spontaneously to self-archiving without a mandate long ago have
anything to do with either (a) the IR/CR issue, or (b) viable
alternatives to mandates (because no one at all no one so far has
demonstrated any, apart from waiting and waiting) for generating the
85% of content missing from IRs, and OA as a whole.
Stevan Harnad
Received on Sun Feb 08 2009 - 20:00:24 GMT
This archive was generated by hypermail 2.3.0
: Fri Dec 10 2010 - 19:49:40 GMT