One important thing you didn't mention is that the papers must be
discoverable by browse clicks high in the tree, so the crawler can find them
all. Even if human users almost never use browse views.
Google's advice is also to avoid the "documents 1-10" "Next 10 documents"
type of structure as the crawler is likely to give up on depth before
crawling the whole repository.
Search facilities are not used by crawlers, of course.
Eprints is an excellent example of software designed to facilitate crawling.
Arthur Sale
> -----Original Message-----
> From: American Scientist Open Access Forum
[mailto:AMERICAN-SCIENTIST-OPEN-
> ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG] On Behalf Of Andy Powell
> Sent: Friday, 10 March 2006 21:00
> To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
> Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] Use of Navigational
> Tools in a Repository
>
> > What causes this difference? Is it an intrinsic feature of
> > the repository software, or a side effect of the organisation
> > of the repository and the interlinking of the pages that it
> > exposes. Is it all down to the sets of navigational pages
> > that are provided internally? (Does the subject
> > classification pull its weight in this aspect of a
> > repository?) Or is the difference rather in the context that
> > the repository is situated? Will a repository that is well-
> > linked into its community (with high pagerank scores) have
> > different behaviour from an isolated repository?
>
> To a search engine, the Web interface to a repository is just like any
> other Web site, so conventional wisdom about improving site/page ranking
> will apply. The kinds of recommendations under
>
> http://searchenginewatch.com/webmasters/
>
> and elsewhere will therefore be appropriate.
>
> As far as I know, it is generally accepted that broad flat site
> structures are more easily crawled than deep narrow ones, that
> well-ranked sites are crawled more often and more thoroughly than
> lower-ranked sites and that using the 'title', 'meta keywords' and 'meta
> description' tags sensibly on the abstract (and other) pages will help,
> as will ensuring that the important text on the abstract page comes
> first.
>
> As a consequence, it makes sense to position the repository pages
> closely within the institutional Web site (e.g. use
> www.bath.ac.uk/publications/, rather than archive.bath.ac.uk or, worse,
> www.bath.edu/repository) and then to try and keep the hierarchy below
> that as flat as possible.
>
> And as Les suggests, links to the pages in the repository (and anywhere
> else on the site for that matter) from outside are important (e.g. from
> Connotea) - though there is of course a chicken and egg problem
> initially. Serving useful RSS feeds from the repository and trying to
> ensure that they get picked up and embedded into other highly ranked
> sites might help. As will manually, or automatically, submitting the
> key repository pags directly to the search engines and directories (e.g.
> Yahoo and Open Directory).
>
> It'd be interesting to know whether encouraging linking based on
> OpenURLs results in lower page-rank (or at least results in page-rank
> not being increased) - ditto for other redirected links such as those
> using dx.doi.org? I assume that the former harms the page-rank of pages
> in the repository (though other benefits might outweigh this
> consideration), but the latter is OK??
>
> Andy
> --
> Head of Development, Eduserv Foundation
> http://www.eduserv.org.uk/foundation/
> andy.powell_at_eduserv.org.uk
> +44 (0)1225 474319
Received on Sat Mar 11 2006 - 01:17:30 GMT