Re: access to self archive via google scholar

From: Arthur Sale <ahjs_at_ozemail.com.au>
Date: Mon, 23 Oct 2006 09:49:02 +1100

When reading this posting from Thomas J Walker, please remember the
following:

1 Google's bot will visit probably only once a month or so. You can't
expect instant results.

2 Page rankings based on links take time to develop, as they are the result
of multiple harvesting of many sites.

3 Good repository software like Eprints has deliberately designed features
which allow Google (and other bots) to penetrate into the repository and
index the pdfs. These include a 'browse facility' (bots can't search) and
avoidance of the type of pages that go 'first 20', 'next 20', etc etc etc
which cause most bots give up through excessive depth.

4 If it can't be found by a bot, it doesn't exist on a search engine.

5 Google indexes pdfs. It first converts them to html. Most bots don't.

Arthur Sale

> -----Original Message-----
> From: American Scientist Open Access Forum
[mailto:AMERICAN-SCIENTIST-OPEN-
> ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG] On Behalf Of Walker,Thomas J
> Sent: Monday, 23 October 2006 8:42 AM
> To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
> Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] access to self archive
via
> google scholar
>
> Those starting an IR should not expect Google to quickly harvest or to
> logically rank the journal articles posted on the new IR. This is based
> on my recent experience with helping the Florida Center for Library
> Automation (FCLA) in their efforts to test the IR waters with
> ScholArchive (http://eprints.fcla.edu/ ), a pilot IR that is focused on
> the scholarly output of the faculty and graduate students of the
> University of Florida's Department of Entomology and Nematology.
>
>
> ScholArchive (using E-print software) went on line 28 July 2006 with 7
> posted articles. Here are excerpts from five emails relevant to their
> harvesting and ranking by Google and a report of today's results:
>
> 1 Aug 2006 (from ScholArchive Administrator)
> "I have registered us with Google, Google Scholar, SCIRUS, ROAR, DOAR,
> etc. so we should be indexed very soon by lots of search engines,
> hopefully."
>
>
> 1 Sep 2006 (from ScholArchive Administrator)
> "I have been monitoring Google Scholar, Google and other discovery sites
> for the past 5+weeks since your papers were loaded, with the same
> disappointing results, even though I registered ScholArchive with these
> sites."
>
>
> 1 Sep 2006 (from Tom Walker to ScholArchive staff)
> "This is disappointing because faculty will be more likely to post their
> journal articles in ScholArchive IF we can show that doing so will
> significantly help Google users find openly accessible full text of the
> articles.
>
> To illustrate how this might prove to be the case, consider my 2001
> Environmental Entomology article entitled "Butterfly migrations in
> Florida: seasonal patterns and long-term changes." This morning I
> entered "butterfly migrations in Florida" as a Google search phrase and
> got 36 hits (under 11 main listings). Here are the first six main
> listings:
>
> 1. My personal web site. [A click on Google's listing loaded the PDF
> file of the article.]
>
> 2. BioOne. [A click on the listing loaded the abstract, but without a
> BioOne license the full text would be inaccessible.]
>
> 3. Ingenta Connect [A click led to a page with the abstract and a chance
> to pay $25 for access to the full text.]
>
> 4. TX-BUTTERFLY archives. [A click led to a bibliographic entry that had
> a dead link to the PDF file of the article. (My web site's URL was
> changed a few years ago)]
>
> 5. Journal of the Lepidopterists' Society [A couple of clicks led to a
> 1993 article on trapping migrating butterflies.]
>
> 6. The Entomological Society of America Journals Online [A click led to
> the TOC of the issue, another click led to the abstract, and a third led
> to the PDF file. But unless someone knew that I had paid ESA to provide
> OA for my article, who would have thought that free access to the PDF
> file would have been found here?]
>
> BOTTOM LINE: Had I not posted the PDF file on my Web site, very few
> would have found free access to the article's full text. Thus it is
> important to know how Google will rank the ScholArchive posting.
>
> Incidentally, I ran the same search in Google Scholar BETA and got only
> one hit-the same as no. 2 above!"
>
>
> 20 Sep 2006 (from ScholArchive Administrator)
> "As it turns out, Google is indeed indexing our site, but only the
> top-level pages, not the papers inside the repository. I am working on
> how this can be changed."
>
>
> 9 Oct 2006 (from Tom Walker to ScholArchive staff)
> "Yesterday I checked Google to see if the ScholArchive version of my
> butterfly migration paper had been harvested. It had not, and worse,
> the order of the sites that offered it had been changed. My (free)
> offering of the paper on my home page was now the fourth of the main
> listings (instead of first). Two for-fee offerings were first and third
> and BioOne was second."
>
>
> 22 Oct 2006
> When I searched Google this afternoon for the butterfly migration paper,
> the for-fee sites that had been No. 1 and No. 3 now occupied main
> entries No. 1 and 2 in the search results. Howerver,my homepage site
> (free) was now No. 3 and the posting on ScholArchive (free) was now No.
> 4. BioOne had dropped to No. 5.
>
> The current ranking is still a disappointment but better than on 9 Oct
> (and worse than 1 Sep).
>
> [On Google Scholar (beta), the BioOne posting was all that was offered.]
>
> Tom
>
> ====================================
> Thomas J. Walker
> Department of Entomology & Nematology
> PO Box 110620 (or Natural Area Drive)
> University of Florida, Gainesville, FL 32611-0620
> E-mail: tjw_at_ufl.edu
> FAX: (352)392-0190
> Web: http://tjwalker.ifas.ufl.edu
> ====================================
>
>
> -----Original Message-----
> From: American Scientist Open Access Forum
> [mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG]
> On
> Behalf Of Donat Agosti
> Sent: Saturday, October 21, 2006 1:54 AM
> To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
> Subject: access to self archive via google scholar
>
> What would it need that self archives could be indexed by google
> scholar, so that those articles could be found
>
> search for example for
>
> "viaticus was tridecane"
>
> Then you end up in this paper
> http://scholar.google.com/scholar?q=viaticus+was+tridecane&hl=en&lr=&btn
> G=Se
> arch
>
> It is it the original paper, which is copyrighted, and there is not hint
> that the paper is actually also on ZORA open access.
>
> http://www.zora.unizh.ch/zora/handle/2379/4727?mode=full&submit_simple=S
> how+
> full+item+record
>
> Ideally, it should show up, since then it would be more often used
>
>
>
> Donat
>
> Dr. Donat Agosti
> Science Consultant
> Research Associate, American Museum of Natural History and Naturmuseum
> der Burgergemeinde Bern
> Email: agosti_at_amnh.org
> Web: http://antbase.org
> Blog: http://biodivcontext.blogspot.com/
> Skype: agostileu
> CV
> Current Location
> Dalmaziquai 45
> 3005 Bern
> Switzerland
> +41-31-351 7152
Received on Mon Oct 23 2006 - 11:05:20 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:32 GMT