Re: On Not Conflating Open Data (OD) With Open Access (OA)

From: Uhlir, Paul <PUhlir_at_NAS.EDU>
Date: Fri, 21 May 2010 08:35:51 -0400

Apropos this discussion, for those interested in the management and policy details associated with scientific data, primarily from the US (government and National Academy of Sciences) perspectives, you may wish to refer to the following publications, going back 15 years. All are openly available.

National Science Foundation [NSF] (2010), Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information.

National Research Council [NRC] (2009a), Ensuring the Integrity, Availability, and Stewardship of Research Data in the Digital Age.
NRC (2009b), The Socioeconomic Effects of Public Sector Information on Digital Networks: Toward a Better Understanding of Different Access and Reuse Policies, US CODATA.

Office of Science and Technology Policy [OSTP] (2009), Harnessing the Power of Digital Data for Science and Society.

Microsoft Research (2009), The Fourth Paradigm: Data-Intensive Scientific Discovery.

Uhlir, et al. (2009), Toward Implementation Guidelines for the GEOSS Data Sharing Principles, published concurrently in the Journal of Space Law and the CODATA Data Science Journal.

NSF (2008), Fostering Learning in the Networked World: The Cyberlearning Opportunity and Challenge.
Organisation for Economic Co-operation and Development [OECD] (2008), Recommendation on Principles for Access to Public Sector Information.

OECD (2007), Principles and Guidelines for Access to Research Data from Public Funding.

NRC (2007), Environmental Data Management at NOAA: Archiving, Stewardship, and Access.

Uhlir and Schröder (2007), Open Data for Global Science.

Uhlir (2007), The Emerging Role of Open Repositories for Scientific Literature as a Fundamental Component of the Public Research Infrastructure, in Open Access: Open Problems, G. Sica, ed., Polimetrica.

NRC (2006), Strategies for Preservation of and Open Access to Scientific Data in China, US CODATA.

Association of Research Libraries [ARL] (2006), To Stand the Test of Time: Long-term Stewardship of Digital Data Sets in Science and Engineering.

National Science Board [NSB] (2005), Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century.

NRC (2005), Expanding Access to Research Data: Reconciling the Risks and Opportunities.

ERPANET and CODATA (2004), Electronic Preservation and Access Network Training: The Selection, Appraisal, and Retention of Digital Scientific Data

NRC (2004a), Open Access and the Public Domain in Digital Data and Information for Science, ISTIP.
(2004b), Electronic Scientific, Technical, and Medical Journal Publishing and Its Implications.
(2004c), Licensing Geographic Data and Services.

International Council for Science [ICSU] (2004), Scientific Data and Information: A Report of the Committee on Scientific Planning and Review Assessment Panel.

Uhlir (2004), UNESCO Policy Guidelines on the Development and Promotion of Governmental Public Domain Information.

NRC (2003a), The Role of Scientific and Technical Data and Information in the Public Domain, ISTIP.
(2003c), Resolving Conflicts Arising from the Privatization of Environmental Data.
(2003d), Ensuring the Quality of Data Disseminated by the Federal Government.
(2003e), Sharing Publication-related Data and Materials: Responsibilities of Authorship in the Life Sciences.
(2003g), Government Data Centers: Meeting Increasing Demands.

NSF (2003), NSF’s Cyberinfrastructure Vision for 21st Century Discovery.

Reichman and Uhlir (2003), A Contractually Reconstructed Research Commons for Scientific Data in a Highly Protectionist Intellectual Property Environment.

NRC (2002a), Access to Research Data in the 21st Century.
(2002b), Health Data in the Information Age: Use, Disclosure, and Privacy.
(2002c), Geoscience Data and Collections: National Resources in Peril.

NRC (2000a), Improving Access to and Confidentiality of Research Data.
 (2000b), The Digital Dilemma.

NRC (1999), A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases, US CODATA.

NRC (1997), Bits of Power: Issues in Global Access to Scientific Data, US CODATA.

NRC (1995), Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving the Nation's Scientific Information Resources, US CODATA.


________________________________________
From: American Scientist Open Access Forum [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG] On Behalf Of Stevan Harnad [amsciforum_at_GMAIL.COM]
Sent: Thursday, May 20, 2010 4:11 PM
To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
Subject: Re: On Not Conflating Open Data (OD) With Open Access (OA)

When should research data be made OD? Not immediately upon
collection, since then the collectors lose the first crack at mining
their own hard-won data.

Benjamin Geer suggests immediately upon publication (presumably
the publication of a refereed journal article based on the
data in question). But the first of the collector's articles based on
that collection or the last? How many are allowed with exclusivity?
and how long?

That's why I said there are many other questions and problems peculiar
to OD that are not shared with OA (and should not be linked to OA,
thereby making consensus on adopting an OA mandate
harder to reach, or less likely to be complied with).

That said, the data on which a publication is based should immediately
open to auditing -- but not necessarily OD.

Some more replies below:

On Thu, May 20, 2010 at 11:06 AM, Benjamin Geer benjamin.geer --
gmail.com wrote:

> Stevan, although I agree with you on Open Access, I disagree with you on
> Open Data. There are strong arguments for making scientific data publicly
> accessible at the time of publication.

At the time of the first publication the author derives from that
data-set? He gets only one exclusive crack at it?

> "WHY ARCHIVE?
>
> Norms of methodological transparency encourage honesty in the reporting of
> research results. In a worst-case scenario, pressures for career
> advancement, tenure, or prestige may create perverse incentives to “publish
> or perish” that, if not countered with some form of accountability, can
> easily lead researchers to misstate conclusions.

Many forms of accountability are possible that are short of immediate OD.

> Yet erroneous inferences may
> not even necessarily result from nefarious intentions. Simple coding errors
> or a flawed syntax file can produce results that the investigator believes
> to be correct even when they are not. Making the data and programming
> decisions publicly available limits the extent to which bad findings
> influence future research.

They are already open to the referees, if requested. They ought to be
open to some auditors too. But OD is rather more than that.

>There are, in fact, ample examples of errors in quantitative analysis
> leading to—at best—ambiguity in findings. One replication of a 1986 American
> Sociological Review article led to a debate over whether four different
> couples in the analyzed survey sample were really having sex 88 times a
> month, or if the 88s in the data file were actually meant to refer to
> missing observations (Jasso 1985, 1986; Kahn and Udry 1986). A broader
> study in economics by Dewald, Thursby, and Anderson (1986) sought to
> replicate a year’s worth of articles in the Journal of Money, Credit, and
> Banking. The principal finding was that, in the vast majority of cases, it
> was entirely impossible to exactly replicate the published results even with
> the help of the articles’ original authors. This led to the adoption of more
> stringent requirements in journals such as the American Economic Review
> requiring that data be made available at the time of publication."
>
> Jeremy J. Albright and Jared A. Lyle, “Data Preservation Through Data
> Archives,” PS: Political Science & Politics 43, no. 01 (2010): 17-21.

All very important, and reason for rigorous refereeing and aggressive
auditing -- but not yet OD.

> I also disagree that mandating open data will remove researchers' incentive
> to collect the data in the first place. Their incentive to collect the data
> will still be that they will get the first opportunity to interpret the data
> and publish their interpretation.

Just until their first publication?

> If the data are really novel and their
> interpretation is valid, that is all they need to advance their careers as
> researchers.

One publication? What if they've gathered a lot of time-consuming
data, amenable to a lot of time-consuming analysis?

> Embargoes on the publication of data just slow down science,
> holding it hostage to the self-interest of a single researcher, by giving
> that researcher a monopoly on the use of the data in question, forbidding
> others from attempting to verify that researcher's interpretation.

In some cases you may well be right. But it's not clear whether that's
most cases. In contrast, OA is exception-free: from the moment of
acceptance for publication, refereed research findings can and should
immediately be maDE OA.

STEVAN HARNAD

>
> Ben
> http://sites.google.com/site/benjamingeer/
Received on Fri May 21 2010 - 14:21:11 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:50:10 GMT