--- But that is not all. The same problem exists for institutions. My institution, INRIA, used to be called IRIA. We got the N (for National) only after 13 years of existence. And I did publish as Bernard Lang from INRIA papers that were written by Bernard Lang from IRIA. The paper actually gives IRIA as my institution (though it also says that my address is at INRIA) and the publisher thinks my institution is INRIA. The is also a place in Netherland called CWI, which was formerly the Mathematische Centrum (apologies if the spelling is wrong). So, how do you search by author institution ? Or how do you search the reports of an institution if you do not know all its names. And I am pretty sure there are unrelated institutions that have the same name, though I did not bother finding an example. Shouldn't we have unique identifiers in this case too ? Possibly ... though things may be more complicated, since institutions can merge or split (I have examples of both), though many researchers are not aware of that. --- The problem arises again with publications. The same, identical, paper can sometimes appear on several media, and it is useful to know that to make search easier (though we should assume the problem is not essential for e-documents which should be easier to access). The same physical book can be part of a series of books, as well as the proceeding of a conference belonging to two usually independent series of annual conferences. So it can be referenced as the book, as proceedings of conference A or as proceedings of conference B. But is is the same things, and while a library may think it does not have proceedings of conference B, it actually has proceedings of conference A ... the same. I did not make up this example, I have seen it. There is also the issue of conference series that change their names. etc ... --- Conclusion : the name is not the thing. One name may denote many things and one thing may have many names. And that is generally true for many real life entities. [side note : I wonder how sorcerers deal with this issue ... but maybe, ISBN like references is what they call the true name of things ... for more scientific details on this, read for example "a wizard of Earthsea" by Ursula le Guin.] Having unique references (ISBN style) will solve part of the problem. However that will not solve all problems, and we may want whatever database structure we are using to be able to describe other relations such as split and merges between institutions, between associations, between collections, between conferences ... to make it easier to retrieve documents, or to analyse publication structures, to evaluate productive output of x or y. It would also be useful to relate variations of the same paper. --- Sorry if I have been stating what some of you consider obvious. Jean-Claude's remarks were obvious to me, and I assumed that if he felt it useful to state them, there was a chance that my further developments might be useful too. Cordialement Bernard Lang PS My late colleague and boss, Gilles Kahn, acting as scientific director of INRIA, spent considerable time on local reports trying to determine from the names of conferences and publications, what conferences or publications were actually named. This should not happen. * guedon <jean.claude.guedon_at_umontreal.ca>, le 18-08-06, a écrit: > The author disambiguation is indeed a really important issue. It affects > all kinds of things, ranging from the Science Citation Index to even > some commercial offerings. For example, while searching an author in a > Springer journal the other day, I noticed that their own search engines > distinguished between the author's name with full first name from the > same author's name with just the initial... I had to search through two > lists of articles instead of one. > > I believe that scientific and scholarly authors ought to be given a > permanent identifier which ought to accompany their publication in any > journal that carries peer review. In effect, it would be the equivalent > of an ISBN. > > The easiest way to begin implementing this PAI (Permanent Author > Identifier) might be for a group of journals to come together and agree > that when a paper is submitted, the author must supply his/her permanent > identifier. If he/she does not have one, indicating so would mean that > the cooperating publisher would assign one immediately and would place > it in an open database. Universities could encourage their students to > take up such an identifier as soon as these are on a track (e.g. > doctoral studies) that should lead to some publishing. > > In conclusion, I do not claim to have clear strategies about this PAI, > but the need for one appears very high to me. In particular, it would be > very useful for institutional repositories and the OA movement in > general. > > Google and other large search engines might be interested in supporting > such a development. It would greatly enhance the capability of Google > Scholar. Countries that do not use the Latin script or use it with funny > diacritical marks (as in Guédon) might also find it useful to have their > scientists unambiguously visible in the whole world, even though this > might decrease the number of "scientists" for any given country. > > Best, > > jcg > > > > Le vendredi 18 août 2006 à 08:51 -0400, Timothy Miles-Board a écrit : > > The EPrints team have been looking at this issue in some detail. The current > > version of EPrints has "clone" and "new version" options which save having > > to re-enter metadata for similar/different versions of an existing deposit. > > However, this doesn't help much if you are starting a new deposit. The > > approach we've been favouring of late is auto-completion (like Google > > Suggest http://labs.google.com/suggest), whereby the depositor begins typing > > the first few characters of the name of a co-author and is presented with a > > pop-up list of suggestions. The behind-the-scenes logic that determines what > > to suggest can be customised to an individual repository's requirements e.g. > > suggest from the list of registered users, suggest by looking up in the > > institutions user account (e.g. LDAP) server, suggest according to an > > internal database list of institutional and non-institutional users. The > > previous deposits that you have made can also inform the list of suggestions > > e.g. frequent/recent co-authors can be promoted to the top of the list of > > suggestions. > > > > This is not just about minimising keystrokes - the suggestion mechanism we > > implemented is also able to carry additional data about the authors being > > suggested. You mention the potential for cross-linking an author's work > > between archives. In order to do this you need to be able to uniquely > > identify them. Author disambiguation is potentially important for the > > Research Assessment Exercise (RAE) in the UK. When an author's name is > > autocompleted, the ID of that author is also attached. > > > > We have also successfully applied the auto-completion technique to keywords > > and journal names (with the ISSN number of the journal being passed with the > > suggestion and used to auto-fill the ISSN field upon selection of the > > intended journal by the user). > > > > Although for the moment we've decided not to include it in the next version > > of EPrints (3.0), it will be in a future version. In the meantime, I'd be > > happy to describe our technique in more technical detail on the eprints.org > > wiki if that would be useful (creating an autocompleting field in the > > EPrints deposit form using an open source AJAX library is straightforward- > > the complicated bit comes in designing the (independent) program that makes > > appropriate and useful suggestions in reponse to the user's keystrokes). > > > > It is also worth noting that EPrints 3.0 will have a number of new options > > for importing data e.g. users can create new deposits by cutting and pasting > > BibTeX/EndNote/etc entries from a bibliography file into a textbox and > > hitting a button. > > > > Tim > > > > -- > > Timothy Miles-Board > > EPrints Services > > Southampton, UK tmb_at_ecs.soton.ac.uk > > http://www.eprints.org/services/ > > Consultancy - Training - Hosting > > > > > > > > On Tue, 15 Aug 2006 11:08:58 +0100, Andrew A. Adams > > <a.a.adams_at_READING.AC.UK> wrote: > > > > >Regarding this note, one of the things we're struggling with in setting up a > > >pilot of an IR at the University of Reading (the School of Systems > > >Engineering and the School of Maths, Meteorology and Physics are jointly > > >piloting an IR for the Univrsity) is that of manually inputting local > > >institutional co-authors. It's one of the weaknesses, IMHO, of the GNU > > >eprints software that it doesn't have two methods of author input - selection > > >from a list of institutional users already registered, and free text input of > > >non-institutional authors. In fact, even with non-institutional authors, it's > > >quite common to regularly author joint papers with the same > > >non-co-institutional a number of times, if one has a productive external > > >collaboration. I would prefer, rather than manually entering each author name > > >in free text, to have a search system available for "registered authors" not > > >all of whom need to be registered users of the system (which deals with the > > >issue of people leaving institutions and stopping being registered users but > > >remaining as authors for their prior papers). If a new co-author is to be > > >entered, then minimising the number of keystrokes and the utility of having > > >more than just free-text name-entry only available, though not neceesarily > > >mandated, should be considered. As the IR grows then, if it is deemed useful, > > >people can be employed to add extra information onto the non-user author > > >details, such as affiliation at the time the paper was deposited, and > > >possibly cross-links to other IRs containing the works of that author (which > > >could also be useful for authors moving between institutions). > > > > > > > > >-- > > >*E-mail*a.a.adams_at_rdg.ac.uk******** Dr Andrew A Adams > > >**snail*27 Westerham Walk********** School of Systems Engineering > > >***mail*Reading RG2 0BA, UK******** The University of Reading > > >****Tel*+44-118-378-6997*********** Reading, United Kingdom -- Le brevet logiciel menace votre entreprise Software patents threaten your company Soutenez la Majorité Économique - Support the Economic Majority http://www.economic-majority.com/ Bernard.Lang_at_inria.fr ,_ /\o \o/ Tel +33 1 3963 5644 http://pauillac.inria.fr/~lang/ ^^^^^^^^^^^^^^^^^ Fax +33 1 3963 5469 INRIA / B.P. 105 / 78153 Le Chesnay CEDEX / France Je n'exprime que mon opinion - I express only my opinionReceived on Fri Aug 18 2006 - 18:58:47 BST
This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:28 GMT