Hello,
Personally I am feeling uncomfortable with this ranking because, to my mind, it
is uncomplete and unprecise.
It is uncomplete because the repositories hosted on the subdirectory are not
ranked (e.g : www.xxx.zz/repository) for technical reasons, even if, as Isidro
noted, "the number of these repositories is far lower than the
"non-repositories" listed in ROAR and OpenDOAR".
It is unprecise because it is based on web automatic commands that are very
sensitive to noise. For example, it is the case for the visibility indicator
(external inlinks). As far as I understand from Isidro explanations, a part of
this indicator is calculated with the yahoo linkdomain function :
linkdomain:
http://my_site –site:my_site
I tested this function on a few repositories ranked including our one. More than
90% (and, in some cases, I guess more than 99% inlinks) are not significant
because they come from :
- automatic spam web site (e.g: www.find-pdf.com, www.mypdffiles.com,... or
automatic site such as
http://www.123people.fr )
- automatic links from OAI harvesters
- automatic links that comes from other domains of the university (e.g. :
auto-citation through automatic personnal author’s pages)...
- automatic repetition of the same link : in some forums, a link on the main
page will be duplicated automatically on all archive pages so, with one manual
significant link you can have several hundred of unsignificant automatic links.
- …
The other indicators (size, rich files, scholar) may also be hazardous for
similar reasons.
According to Isidro, all these points affect the numbers but not (much) the
ranking. This should be confirmed...
Kind regards,
Fred
Isidro F. Aguillo a écrit :
Dear all:
In fact we have already take into account some of your comments in
the last editions of the ranking. Let me explain:
- The ranking is based on a ratio 1:1 between ACTIVITY and
VISIBILITY, so it is as important as publishing a lot of OA papers
doing it in a way others (worldwide) can recover, use and link them.
The ratio 1:1 means the weight of each is 50%. As stated in previous
messages, Visibility is measured counting the total number of
external inlinks.
- Regarding activity, we decided to follow your advices so the value
is calculated giving more or less the same value to these three
variables:
* Number of papers, usually full text articles, using as a proxy the
number of items from Google Scholar
* Number of web pages: ALL the webpages (usually html or similar
ones, but also other formats) of the website
* Number of documents: A subset of the former, those files in rich
format like pdf, ps, doc or ppt. It is probably true that pdf is not
the best format and perhaps we should consider other formats, but
people are not using other formats. The number of files in
OpenOffice formats, XML, or others are negligible, useless for
ranking purposes.
- PMC. Our policy is not to rank repositories without its own domain
or subdomain. There are technical reasons but also visibility ones.
The address of PMC is "absurdly" complex:
www.ncbi.nlm.nih.gov/pmc
Regarding UK PMC they are included in the ranking but its position
is delayed because they do not use suffixes in their file's names.
They have hundreds of thousands of Adobe Acrobat (pdf) files without
making them as *.pdf. This avoid an efficient filtering by file type
by major search engines.
Best regards,
--
Fred Merceur
Ifremer / Bibliothèque La Pérouse
frederic.merceur_at_ifremer.fr
Tél : 02-98-49-88-69
Fax : 02-98-49-88-84
Archimer, Ifremer's Institutional Repository
Avano, a marine and aquatic OAI harvester
Bibliothèque La Pérouse
Avant d'imprimer, pensez à l'environnement!
Received on Mon Jul 12 2010 - 15:47:54 BST