On Wed, 5 Sep 2001, Declan Butler wrote:
> As metadata are expensive to create - it is estimated that tagging
> papers with even minimal metadata can add as much as 40% to costs
For what purpose is the metadata? Minimal retrieval metadata (title, author,
date) is different from minimal bibliographic metadata (journal, volume,
issue, page range) which is certainly different from minimal 'ontological'
metadata (effective, community-agreed vocabularies of subject descriptors).
The 40% is estimated by whom? And 40% of which costs, precisely?
Self-archivers are used to adding their own metadata at minimal
inconvenience. Automatic extraction and analysis tools allow more
bibliographic and reference metadata to be extracted, as we can all see
from ResearchIndex
http://citeseer.nj.nec.com/cs
as well as our own OpCit project
http://opcit.eprints.org/ .
There are issues concerning quality and maintenance, but these apply to the
literature as well as the metadata, and have well-rehearsed solutions.
> OAI is developing its core metadata as a lowest common denominator to
> avoid putting an excessive burden on those who wish to take part.
My memory of the OAI minimalist decision
http://www.openarchives.org/meetings/SantaFe1999/sfc_entry.htm
was that a "lowest common denominator" was necessary to
allow realistic interoperability: ie it was all we could reasonably expect
people to agree on at that stage of the game!
"Don't make things more complicated than they need to be to get
something simple working NOW." This is in the sprit of the Los Alamos
Lemma:
http://oaisrv.nsdl.cornell.edu/pipermail/ups/1999-November/000048.html
Of course this can be seen in an economic context (little funding and
little time) but not the economic context invoked in the Nature essay.
> Not all papers will warrant the costs of marking up with metadata, nor
> will much of the grey literature, such as conference proceedings or the
> large internal documentation of government agencies.
Of course there is a metadata trade off between what you are willing to
put in and what you expect to get out. However, it is precisely the
grey literature (so-called) which needs effective retrieval mechanisms,
for much of this forms the cutting edge of research communication.
Our own studies of arXiv.org indicate that the majority of unpublished
preprints go directly on to become journal articles, and the majority
of the remainder are presentations and reports that are reworked as
subsequent journal articles.
http://opcit.eprints.org/opcitresearch.shtml
We hope that our ongoing analyses of what is really happening in Open
Archives will help inform us (and funding agencies) about what is
truly valuable and therefore what materials are worth the effort (and
cost) of "marking up with metadata". As to the documentation of
government agencies, I leave that for another day (but I believe a
similar argument will apply).
--------------------------------------------------------------------
Les Carr lac_at_ecs.soton.ac.uk
Department of Electronics and phone: +44 23-80 594-479
Computer Science fax: +44 23-80 592-865
University of Southampton
http://www.ecs.soton.ac.uk/~lac/
Received on Wed Jan 03 2001 - 19:17:43 GMT