Data remains a valuable resource, even after the end of the project for which it was collected and used. Sharing data enables the future you and other researchers to open up new lines of enquiry without the duplication of effort involved in collecting data again. This page provides guidance on why and how to share data as well as choosing the right data sharing strategy.
Why share data
Sweden, like other members of the European Union as well as other countries worldwide, is in the middle of making science more open, transparent, and inclusive. Open science is about extending the principles of openness to the whole research cycle and making scientific information openly accessible to society to promote innovation, knowledge transfer, as well as awareness. The two pillars of utmost importance for the development of open science are open access to scientific publications and open access to research data. Open access to research data means that data collected and/or generated as part of a research or environmental monitoring and assessment activity shall be made openly and freely available online. Such shall occur in accordance with current legislation (as per EU’s underlying principle “as open as possible, as closed as necessary”) and allow for optimal reuse of data. In this context, take a gander at SLU’s policy for scientific publishing, which encourages SLU personnel to make research data openly available.
In the course of the open science movement, both national funding agencies as well as journals and publishers are increasingly mandating to making data openly available. Thus, both the Swedish Research Council (VR) and Formas (Swedish Research Council for Sustainable Development) have developed policies on research data, requiring open access to research data financed via public funds. On the other hand, a rising number of journals and publishers have formulated research data sharing policies that range from supporting and encouraging to expecting to, in fact, requiring data sharing (see PLosOne, PNAS, Frontiers, American Chemical Society, or SpringerNature for examples of data sharing polices from journals and publishers).
However, there are a number of reasons for why you should share data – apart from governments, funding agencies, as well as journals and publishers increasingly mandating to share data freely. Sharing data can
highlight and advance your research,
Sharing data means that your peers will be able to discover your work more easily. Greater visibility and impact of your research may lead to greater recognition of your scholarly work. It may, furthermore, increase your profile as a researcher, by ensuring credit is given to data as an output in its own right. Finally, it may lead to new collaborations and partnerships as shared data makes other researchers more aware of your own research.
demonstrate your research integrity,
By sharing data you allow others to verify and validate your work.
progresses science in general,
Shared data opens up new avenues of research, encourages scientific enquiry and debate, and promotes innovation. As such, it allows data to be independently validated, tested, and scientific findings to be verified. It, moreover, allows for validation and improvement of research and applied scientific methodologies. In addition, newly arising collaborations may increase the potential of reusing data in a completely new manner. Finally, shared data reduce the costs of duplicating data collections.
be a resource for education and training, and
maximise transparency and accountability.
Ways of sharing data
Sharing data can mean a number of things. Here, data sharing is defined as the practice of making data available to others. Historically, data has been shared ‘upon request from the author’. Nowadays, data may still be shared upon request, yet there are more reliable and secure ways data can be shared. However, data can be made available to others either informally or formally. Informal data sharing means making data available by means of self-dissemination. Formal data sharing or data publishing, on the other hand, stands for making data publicly available and, thus, discoverable via depositing data and/or metadata at a data repository, publishing a data article in a data journal, or publishing an article in a journal together with data as supplemental material.
Self-dissemination (informal data sharing)
Self-dissemination means making data available via peer-to-peer sharing (e.g., ‘upon reqest’) or a project or personal website. Websites, be they project or personal, may offer storage and dissemination but are less sustainable and provide less long-term preservation. In addition, managing data may be costly and controlling who uses the data and how may be difficult.
Depositing data and/or metadata at a data repository (formal data publishing)
Publishing data via a data repository can include one of two things: either depositing both data and metadata or only metadata at a data repository. In any case, depositing data and/or metadata at a data repository allows for it to be discoverable.
Repositories can provide controlled access to sensitive data, and create a catalogue record for data making it more discoverable. They take responsibility for handling data reuse queries, licensing, dissemination and promotion of data on behalf of the data depositor. Some data repositories, also, manage data safely for long-term use, protect data from format obsolescence, data loss, deterioration, or irreversible damage. Using a repository involves depositing data and/or metadata in a digital database, which can be discipline/domain-specific, institutional, or generalist. Finally, repositories, in general, provide a persistent identifier in connection with the deposition of data and/or metadata.
Publishing a data article (formal data publishing)
Data articles focus on data collected, generated and/or reused throughout a research or environmental monitoring and assessment activity, allowing you to describe tools, methods and processes; that is to fully characterise the data. The actual data and/or metadata, however, is often deposited at a data repository (see above).
Data journals seek to promote scientific accreditation and reuse, improve transparency of scientific methods and results, support good data management practices, and provide an accessible and permanent route to the data. It is important to note that data articles are peer-reviewed and citable.
Publishing an article together with supplementary data (formal data sharing or publishing)
As of up to recently, the general way of sharing data has been as supplementary material to a peer-reviewed article published in a scientific journal. This, though, often meant that only certain and by no means all data was shared, and that journals and publishers might have kept that data behind a subscription wall and/or claimed copyright over the data. Moreover, data may not be in a user-friendly format or functional for computer processing.
Nowadays, a growing number of journals and publishers support and encourage, expect, or, in fact, require all data underlying a publication to be shared via publishing data and/or metadata through a data repository (see for instance PLosOne, PNAS, Frontiers, American Chemical Society, or SpringerNature).
Linking data to a published article
Research publications are most useful when supported by their underlying data. Journals and publishers are ever more commonly requiring that data directly underpinning a publication is made publicly available (either shared or indeed published). Authors are to this effect asked to state where the data can be found and under what conditions it can be accessed (and, if not, why). Such statements in publications are known as data access statements (or data availability statements). The University of Bath Library provides detailed information on data access statements, including examples thereof.
However, to most effectively link data to your publication you need a persistent identifier, such as a Digital Object Identifier (DOI). In order to get such an identifier (which can then be included in your publication’s data access statement), you will need to deposit the data and/or metadata at a data repository (i.e., you will need to publish the data). See below for how to obtain a persistent identifier.
Choosing where and how to share data is a crucial matter. Your choice may mean a large and far-reaching impact or little to no reuse. Also, your preference may depend on the existing practices in your discipline or funder and journal/publisher requirements.
Preparing data for sharing
As seen above, data sharing can be done via many different ways. Independent of which way data is shared, it is best practice to properly prepare the data prior to sharing. Once the data itself is ready to be shared there are a number of issues that need to be considered and addressed.
How to prepare data prior to sharing
Sharing data should be easy, should you have followed the previous steps carefully. When sharing data you will need tidy files (see Collect, organise, and store data), in the right format (see Collect, organise, and store data and Preserve data) and with appropriate documentation and metadata (see Collect, organise, and store data, Process and analyse data, and Preserve data); thus, ensuring data can be discovered, accessed, understood, and reused in the future.
What to consider when sharing data
A number of legal, ethical, and commercial issues need to be anticipated and addressed prior to sharing data. It is important to stress that it may not be possible to publish data openly if one of the following happens to be the case:
material classified according to the Public Access to Information and Secrecy Act (SFS 2009:400),
personal or sensitive information (such information needs to be handled in accordance with data protection, freedom of information, and secrecy legislation; see Collect, organise, and store data for more general information and Process and analyse data for measures to desensitise data prior to publishing openly),
material copyrighted by somebody else and copyright prohibits further publishing,
trade secrets or sensitive financial information,
It is important to note that should one of the above apply, you may still publish the data, however, not openly (e.g., publishing the metadata at a data repository with information on how to get hold of the actual data). Should none of the above be the case or existing issues have been resolved (e.g., via consent), it is time to think about intellectual property rights, licencing, and persistent identifiers.
As a rule, material (such as data) produced as part of research as well as environmental monitoring and assessment activities carried out at SLU are considered official records/documents and access to such must be guaranteed on request unless there is a confidentiality provision according to the Public Access to Information and Secrecy Act (SFS 2009:400). Such a request can – if adhering to current legislation – be met by either allowing the enquirer view/obtain the data on site or providing a copy (depending on the circumstances). Should you be uncertain about allowing or denying such a request, you can contact SLU’s Legal Affairs unit. More on the principle of public access to information can be found at Collect, organise, and store data.
Intellectual property rights (e.g., copyright)
Intellectual property (IP) allows a person to own his or her creation in the same way as something physical can be owned. This gives the rights owner control over the exploitation of their work, such as the right to copy and adapt it, the right to rent or lend it, the right to communicate it to the public and the right to licence and distribute. In Sweden, for something to fall under copyright protection, it has to be the “result of personal intellectual creativity” and of “original and unique” character. In general, data belongs to the public domain and is as such not copyrightable. Research data can though, in some cases, fall under copyright protection if it contains copyrightable work (e.g., literary text, computer code, or works of art). For large compilations of research data the database right may be applicable. The database right is comparable to but distinct from copyright, and its purpose is to recognise the investment made in compiling a database. It, however, does not involve the "creative" aspect required in copyright. In copyright, the rights holder is the creator (e.g., an author), while in the database right the rightsholder is the organisation that made the economical investment to enable the compilation (e.g., a university). Please contact Data Management Support (email@example.com) for further information and/or support in this regard.
To enable data reuse, it may need to be appropriately licenced. From a copyright perspective, a licence clearly specifies what others can do with the data you share. It states the conditions according to which reuse is allowed: whether you require attribution (i.e., citation), what type of licence applies when building upon the data shared, whether work resulting from a transformation of the data shared or building upon it can be shared further, and whether commercial use is permitted.
To make data available to the widest audience possible and allow for the widest range of uses possible, it is recommended to choose a standard and open licence. Commonly used standard and open licences for data include Creative Commons and Open Data Commons licences. The Swedish National Data Service (SND) has put together further information regarding licensing specific to the Swedish context. Also, DIGG, the Swedish agency for digital government, has released guidelines for open licensing (currently only available in Swedish). Note that licensing data is one of the principles ensuring FAIR data.
Identifiers for data or information are essential in all computer-based systems. Computer applications apply them for identifying datasets, for searching and retrieval, and for linking or connecting data. Thus, identifiers are persistent links to content, allowing data to be findable and citable. Such unique and persistent identifiers (PIDs) ensure that an object is discoverable at all times. Several types of PIDs exist, such as DOI (Digital Object Identifier), Handle, URN, Ark, PURL, etc. It does not really matter what kind of PID you use, though DOI is currently the most widespread and most integrated in automatic citation counting algorithms. In order to obtain a PID for the data you intend to share, you will need to publish the data and/or its metadata at a data repository (see below).
DMS recommends the Swedish National Data Service (SND) and we can help you prepare data for publishing in the SND national data catalogue. Read more on our page about Publishing data via the Swedish National Data Service.
We can also help you identify other suitable data repositories and assist you in preparing data for publication elsewhere. At re3data.org you can find a comprehensive list of both generic and discipline-specific repositories.
Many publishers provide guidance to repositories relevant in the subject area of their journals. See for example Springer Nature’s list of recommended repositories.
You can also visit Discover, reuse, and cite data for a number of websites that may help you in identifying an appropriate repository as well as for a list of potential repositories you may choose among.
When choosing a repository, it is recommended:
to check your funder and journal/publisher’s recommendations and requirements towards sharing data;
to choose repositories that assign a persistent identifier such as a DOI to the data and/or metadata;
to choose repositories that apply machine-readable metadata and use a known metadata standard (read more about this on our webpage about FAIR data).
Remember to include your affiliation in the metadata when depositing data and/or metadata at a repository of your choice. Follow the instructions issued by SLU of how to correctly state your SLU affiliation when publishing.
Please note that while you can make data available in open data repositories, research data must still be archived at SLU. Visit Archive and preserve data for more information in this respect.