Re: Google's Scholarly Search Service and Institutional OA Self-Archiving

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Thomas Krichel <krichel_at_OPENLIB.ORG>
Date: Wed, 15 Dec 2004 09:43:50 -0600

  Arthur Sale writes

> From thence it finds the pdf files and indexes them as well.

  if such pdf files are indexable. From correspondence,
  I understand that Google use pdftohtml, see

http://pdftohtml.sourceforge.net/

  to extract text out of PDF files. Archive managers should
  ensure that pdftohtml does a reasonable job on their PDF
  files. Fortunately, this is easy because pdftohtml is
  open source software, just as Eprints is.

  Cheers,

  Thomas Krichel mailto:krichel_at_openlib.org
                                 http://openlib.org/home/krichel
                             RePEc:per:1965-06-05:thomas_krichel
Received on Wed Dec 15 2004 - 15:43:50 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:44 GMT