The reason I suggested classification is that various people in the
subjects covered have told
me that they use this archive by checking everything in their subject
classification each day, and that the current rather straight-forward
classification suits them fine.
People work in various ways, especially for current awareness. One of the
many virtues of systems such as this is that they can be designed to be
adaptable to individuals. One of the pitfalls in designing a system, any
system, is to set it up to suit oneself alone.
You mention LoC, an excellent case in point.
It works beautifully--for catalogers.
I did not mention Boolean full-text searching, only because I assumed it.
Stevan, would anyone design such a system without it--still, now?
And I remain much less sanguine than you about the ability to accommodate
all the fields of science -- let alone all academic knowledge -- in a
single relatively simple system.
Anyone who has ever worked in a library can tell you about the
unreliability of a rough arrangement by discipline and journal name.
What subject is Phys Rev B (Condensed Matter)? or J Chem Phys? or Brain
Research?
And if you always remember journal names correctly, I congratulate you but
wish you weren't unique. All your plans--as is inevitable--are shaped by
your own preferences. So would mine be, but at least I realize
it--sometimes.
On Sat, 8 Mar 2003, Stevan Harnad wrote:
> On Fri, 7 Mar 2003, David Goodman wrote:
>
> > I agree that a
> > decentralized archive, as distinguished from arXiV, does not need
> > much in the way of classification
>
> Not even ArXiv needs it: Those are physics articles, not books. They don't
> need LoC classification, only full-text boolean search, with
> scientometric ranking along the lines of:
> http://citebase.eprints.org/cgi-bin/search
>
> Moreover, if ever a useful taxonomy is generated for the refereed research
> article literature, it will be one that is scientometrically (i.e.,
> computationally) generated *from* such a digital database, not an
> old-style a-priori human classification.
>
> > I suspect the practical access for the immediate future will be
> > by known author, supplemented by the citation network.
>
> and boolean full-text search.
>
> > On the other hand, to rely on OAI harvesters and automated search tools
> > for accessing the union of all such collections is premature.
>
> Yes, but not for the reason I think you have in mind! It is premature
> because the union of all such collections is still so empty! As it
> grows, the associated tools will grow (they are the easy part!).
>
> > I am not certain whether it is within human capabilities to design
> > this--certainly none of the extensive efforts at automatic document
> > retrieval are really adequate--it's a problem of the same magnitude
> > as AI in general.
>
> For the human written word corpus as a whole. But not for the 20,000
> refereed research journals, classified, as a first cut, by their
> discipline and journalname. The rest most definitely *is* within human
> capabilities to design (along the lines mentioned above).
>
> > I would love to see this solved, of course, because the
> > known manual methods, as they are applied in libraries and
> > indexing services, are almost equally unsatisfactory.
>
> In the case of the refereed journal corpus (the only corpus at issue
> here), they are not only unsatisfactory, but completely unnecessary.
> Let us nto conflate this very special (and small and tractable) part
> with the (possibly intractable) whole.
>
> Stevan Harnad
>
Dr. David Goodman
Princeton University Library
and
Palmer School of Library and Information Science, LIU
dgoodman_at_princeton.edu
Received on Mon Mar 10 2003 - 21:52:03 GMT