Re: PostGutenberg Copyrights and Wrongs for Give-Away Research

From: Lee Giles <giles_at_IST.PSU.EDU>
Date: Fri, 22 Jun 2001 10:56:10 -0400

Standards are great and often make the difference between the success and
failure of an endeavor. But in some cases other standards can be used and not put
additional burdens on authors and users. It's possible to set up an open archive
that's useful and not require authors any additional work except putting
their papers on their web site in some eformat. This works because there are
already a few but widely used accepted standards for publishing documents - pdf,
doc, postscript, html, etc. (It would be easy to include new ones such as xml.)

The archive works by being active instead of passive. A smart crawler
spiders the web searching for manuscripts. After finding the edocuments,
an indexer converts the documents to text, indexes them and provides a
query engine that allows search based on key words, phrases and citations.
Other features such as cocitation, active bibliographies, collaborative
filtering, etc. can be installed. Links to the original papers can be maintained.

This is entire process is automated except for requirement that the authors
place their papers in some standard eformat in an accessible web site.
Because this is automated, some errors do occur. Subsequently, authors and
others can ask for corrections.

As an example, see researchindex.org and cora.whizbang.com which have
archives for computer science papers. These two archives
have over 300,000 papers, 500,000 unique authors and 3 million citations. In
addition, they receive about 100,000 page views a day. The researchindex
software is free for noncommercial use and cora has established a new
archive for statistics papers.

Best regards,

Lee Giles

Stevan Harnad wrote:

> On Fri, 22 Jun 2001, Thomas J. Walker wrote:
>
> > sh >[I might add only that the distinction between "personal web home page"
> > sh >and "e-print servers" is silly, incoherent, and hence untenable, but it
> > sh >makes no difference, if it makes some people happy to put it that way...]
> >
> > There is distinction that to many authors may be important:
> >
> > E-print servers that are well stocked are a somewhat more convenient place
> > to look for particular articles compared to hunting down the authors' home
> > pages and looking there. Of greater consequence, researchers who are not
> > looking for articles by the authors in question may find articles by them
> > on that well-stocked e-print server, like them, and use them.
>
> Quite right, and this is one of the principal rationales for the Open
> Archives Initiative (OAI) http://www.openarchives.org and Eprints
> archive-creating software http://www.eprints.org
>
> OAI provides a tagging standard that makes all registered OAI-compliant
> Archives interoperable, hence harvestable across archives
> http://oaisrv.nsdl.cornell.edu/Register/BrowseSites.pl
> so you need not know the URL of the paper or the
> author.
>
> You just search them like one big virtual archive in a centralized
> index: See http://cite-base.ecs.soton.ac.uk/cgi-bin/search
> and http://arc.cs.odu.edu/
>
> But the home-page/public distinction is moot, since authors can run
> their own eprints servers too, and register them as OAI-compliant!
> http://rocky.dlib.vt.edu/~oai/cgi-bin/Explorer/oai1.0/testoai
>
> --------------------------------------------------------------------
> Stevan Harnad harnad_at_cogsci.soton.ac.uk
> Professor of Cognitive Science harnad_at_princeton.edu
> Department of Electronics and phone: +44 23-80 592-582
> Computer Science fax: +44 23-80 592-865
> University of Southampton http://www.ecs.soton.ac.uk/~harnad/
> Highfield, Southampton http://www.princeton.edu/~harnad/
> SO17 1BJ UNITED KINGDOM

--
Dr. C. Lee Giles, David Reese Professor
School of Information Sciences and Technology
and Computer Science and Engineering
The Pennsylvania State University
504 Rider Building, 120 S Burrowes St
University Park, PA, 16801, USA
giles_at_ist.psu.edu - 814 865 7884
http://ist.psu.edu/giles
Received on Wed Jan 03 2001 - 19:17:43 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:09 GMT