On 20-Apr-08, at 9:47 AM, Peter Suber wrote:
Hi Stevan: Yesterday I tried to send the message below
to the OACI list. But I got an error message suggesting
that the list has been discontinued.
Instead of predicting citations from early downloads, as
you've done, this team predicts citations from properties
of the article.
Prediction of citation counts for clinical articles at
two years using data available within three weeks of
publication: retrospective cohort study, BMJ, February
21, 2008.
http://dx.doi.org/10.1136/bmj.39482.526713.BE
Conclusion: Citation counts can be reliably predicted at
two years using data within three weeks of publication.
Hi Peter,
I am forwarding your post instead to the Sigmetrics
list: SIGMETRICS_at_LISTSERV.UTK.EDU
This interesting article finds that there are a number of metrics
immediately upon publication that predict citations two years later
(using multiple regression analysis).
1274 articles from 105 journals published from January to
June 2005, randomly divided into a 60:40 split to provide
derivation and validation datasets. 20 article and
journal features, including ratings of clinical relevance
and newsworthiness, routinely collected by the McMaster
online rating of evidence system, compared with citation
counts at two years. The derivation analysis showed that
the regression equation accounted for 60% of the
variation (R2=0.60, 95% confidence interval 0.538 to
0.629). This model applied to the validation dataset gave
a similar prediction (R2=0.56, 0.476 to 0.596, shrinkage
0.04; shrinkage measures how well the derived equation
matches data from the validation dataset). Cited articles
in the top half and top third were predicted with 83% and
61% sensitivity and 72% and 82% specificity. Higher
citations were predicted by indexing in numerous
databases; number of authors; abstraction in synoptic
journals; clinical relevance scores; number of cited
references; and original, multicentred, and therapy
articles from journals with a greater proportion of
articles abstracted. Conclusion: Citation counts can be
reliably predicted at two years using data within three
weeks of publication.
This finding reinforces the importance of taking into account as many
predictor metrics as possible, though a number of the metrics do seem
specific to clinical medical articles. The (apparently already known)
high correlation with physician ratings for clinical relevance is a
variable specific to this field. (The metrics used are listed at the
end of this message.)
We might perhaps make a distinction between static and dynamic
metrics. This study was based largely on static metrics, in that they
are fixed as of the day of publication. Dynamic metrics like early
downloads (which have also been found to predict later citations)
were not included (the Perneger study was cited but the Brody et al
study was not), nor were early citation growth metics (also
predictive of later citations).
Perneger TV. Relation between online "hit counts" and
subsequent citations: prospective study of research
papers in the BMJ. BMJ
2004;329:546-7. doi:10.1136/bmj.329.7465.546
Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web
Usage Statistics as Predictors of Later Citation Impact.
Journal of the American Association for Information
Science and Technology (JASIST) 57(8) pp. 1060-1072.
http://eprints.ecs.soton.ac.uk/10713/
Journal impact factor was not included either, because it was not
available for a large number journals in the sample.
To my mind, the article reinforces the importance of validating all
these metrics, not just against one another, but against peer
evaluations, in all fields, as in the RAE 2008 database:
Harnad, S. (2007) Open Access Scientometrics and the UK
Research Assessment Exercise. In Proceedings of 11th
Annual Meeting of the International Society for
Scientometrics and Informetrics 11(1), pp. 27-33, Madrid,
Spain. Torres-Salinas, D. and Moed, H. F., Eds.
http://eprints.ecs.soton.ac.uk/13804/
Stevan Harnad
----------------------------------------------------------------------------
------------------------------
Predictor variables Hypothesised influences:
Article specific from external sources:
No of authors More authors
Residence of first author in North America North America
No of pages Longer article
No of references in bibliography More references
No of participants More participants
Structured abstract Structured abstracts
Length of abstract Longer
Multicentre studies If multicentred
Original article rather than systematic review If
systematic review
Dealing with therapy If therapy
Article specific from internal sources:
No of disciplines chosen relevant to article (breadth of
interest) More disciplines
Average relevance scores over all raters Higher scores
Average newsworthiness scores over all raters Higher
scores
Average time taken by raters to rate article More time
Whether article was selected for abstraction in 1 of 3
synoptic journals If yes
No of views per email alert sent More views per alert
Journal specific using internal data:
Proportion of articles that passed criteria (2005)
Higher proportion
Proportion abstracted by 3 synoptic journals Higher
proportion
Journal specific using external data:
No of databases that index journal More databases
Received on Mon Apr 21 2008 - 12:52:03 BST