Paper on "merit/fame" correlation

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Wed, 28 Apr 2004 15:31:04 +0100 (BST)

A news report entitled "Physics and fame" in PhysicsWeb
http://physicsweb.org/article/news/8/4/12
summarizes a preprint by James P. Bagrow, Hernan D. Rozenfeld, Erik M. Bollt,
Daniel ben-Avraham entitled "How Famous is a Scientist?..."
http://arxiv.org/pdf/cond-mat/0404515

This paper is related to one that Tim Brody is now preparing
for submission based on his download.citation correlator:
http://citebase.eprints.org/analysis/correlation.php

Equating "merit" with number of papers published and equating "fame"
with number of google links risks circularity.

Google's PageRank does not count usage "hits" (i.e. downloads), it
counts links, modulated by hub/authority weightings using PageRank,
etc. And the correlation is almost tautological: More total items
(whether or not by the same author) will lead to more total links to
(any of) those items. "Same author" is merely a way of bundling items.

The control comparisons (not performed by the authors of the merit/fame
study) would require also calculating the correlations between. I don't
think google's PageRank algorithm controls for this:

    (1) total number of an author's published papers and the average number
    of citations to that author's work (as calculated by citebase
    http://citebase.eprints.org/ or by ISI, not by google),

    (2) total number of an author's papers and average number of links to that
    author's papers (i.e., google PageRanking),

    (3) total number of arbitrary google items from the same producer (not
    research papers) and average number of links to (any of) those items
    (i.e., google ranking), and

    (4) total number of arbitrary google items, bundled arbitrarily, and average
    number of links to (any of) those items (i.e., google ranking),

and then to *partial out* the pure item-quantity effect (perhaps in a
multiple regression equation) to see whether there is any significant
portion of the variance left that predicts "importance," even after the
mere correlation between the quantity of items and quantity of links
is removed. (Something similar needs to be done with download data too,
to partial out the effects of baseline quantity and co-bundling from
the specific merits of the material.)

(The Bagrow et al. paper makes comparisons with the time-course of ace pilots'
"fame," but it seems to me that would require more detailed time-course analyses
of downloads, citations, and links in order to draw any conclusions.)

Stevan Harnad
Chaire de Recherche du Canada
Centre de Neuroscience de la Cognition (CNC)
Universite du Quebec a Montreal
Montreal, Quebec, Canada H3C 3P8
tel: 1-514-987-3000 2461#
fax: 1-514-987-8952
harnad_at_uqam.ca
http://www.ecs.soton.ac.uk/~harnad/
Received on Wed Apr 28 2004 - 15:31:04 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:27 GMT