Discover, reuse, and cite data

Last changed: 12 March 2024
Magnifying glass in front of a computer screen with bar graphs. Illustration.

Here you will find information on how to find published data. You will also learn about the factors to consider when re-using data for your research.

Reusing existing datasets can avoid unnecessary duplication, inspire new research, and allow datasets from different studies or disciplines to be integrated. Many funders require you to investigate whether there is already data that can be used for your research question before you collect new data.

Finding and discovering data

There are many ways to discover, search, and find data. Some examples of search services and data repositories are listed on the SLU University Library’s page Find research data and environmental data.

Reusing data

Responsibilities and rights

Before using data that you have not collected yourself, you have a responsibility to respect the rights that may be held by other people or organizations (including copyright, sui generis database rights, and ethical/moral rights). Therefore, check the terms and conditions of access and use, ensure that any license applied by the author, organization, authority, etc. is suitable for your purposes, and make sure to obtain any permission or consent that may be necessary.

Assessing data before reuse

You also need to assess the quality of the data, its reliability, validity, etc. These questions can be used as a starting point for quality checking the data.

  • Is the source of the data clearly stated? Can it be trusted?
  • Who is hosting the data? Are data available in a sustainable repository?
  • Why were the data collected/generated?
  • Who collected/generated the data, when and how?
  • How were the data processed (does documentation exist)?
  • Are the data ‘clean’ (e.g., were erroneous values deleted)?
  • What quality assurance procedures were used?
  • Is the data well documented?
  • Do the data come with a permanent identifier that can be used for citing it?
  • Do contact details exist in case further information is required?

Documenting the reuse of data

It is highly important to provide documentation when reusing secondary data. Make sure to address the following questions when documenting the reuse of secondary data:

  • What data have been reused?
  • How were the data obtained and where can they be found (identifier/link, search location plus search query applied if needed)?
  • How were the data evaluated?
  • Were the data processed before analysis, and, if so, how?
  • How were the data used within the new study?

You should keep sufficiently detailed documentation about data and methods to enable other researchers to locate the original data and reproduce and validate your findings.

Citing data

When reusing or referring to data, you should cite the dataset just as you cite a scientific article. This is best practice even though it may not have been explicitly stated as a requirement in the license if such was associated with the data. Citing data is recognised as one of the key practices leading to the recognition of data as a primary output in its own right.

Styles and formats for citing data vary in the same way as article citation styles and formats vary. At SLU, referencing is done according to the Harvard system. According to this system, a data citation is written as follows:

Last name, initial of first name. Name of institution (Year). Title. Data archive. Version no. Persistent link.

Example (in the reference list):
Snäll, T. & Mair, L. (2018). Species distribution modelling data for Phellinus ferrugineofuscus. Swedish National Data Service. Version 1.0. https://doi.org/10.5879/ECDS/2017-03-23.1/1

Example (in-text citation):
(Snäll et al. 2018)

If you are having trouble citing data correctly, you can use the DOI Citation Formatter to automatically extract metadata from a DOI and generate a complete citation in a variety of citation styles.

Introduction to data management

This page is part of our introduction to data management. It covers the most common aspects of data management and includes best practice strategies, training resources and tips on data management tools. It is organised according to the data lifecycle (see below), a conceptual model that illustrates the different stages of data management.

Content

  1. What is data management?
  2. Plan data management
  3. Collect, organise, and store data
  4. Process and analyse data
  5. Archive and preserve data
  6. Share and publish data
  7. Discover, reuse, and cite data

 

Data life cycle

The data life cycle model. CC BY SLU Data Management Support. All icons in the life cycle and on the pages are made by Prosymbols from www.flaticon.com.