FAIR data

Last changed: 22 February 2021
FAIR principles, Sangya Pundir CC BY-SA 4.0

The FAIR Guiding Principles for scientific data management and stewardship promote the optimal use and reuse of data by making them Findable, Accessible, Interoperable and Reusable. The Data Curation Unit (DCU) provides advice and support when it comes to making data more FAIR.

Published in 2016, the FAIR Guiding Principles for scientific data management and stewardship describe the characteristics that digital data ought to have in order to optimise the potential for their use and reuse. The principles have since been embraced by a large number of national and international stakeholders, including governments, funders, publishers, research communities, as well as infrastructures. Some funders explicitly require applicants to plan for FAIR data (e.g., Horizon 2020), and publishers increasingly encourage or require authors to publish data in a FAIR manner (e.g., Taylor & Francis).

Below you will find a brief explanation to each of the four foundational principles Findability, Accessibility, Interoperability, and Reusability together with practical advice provided by SLU’s Data Curation Unit (DCU). In short, choosing to publish data in a data repository that checks many of the FAIR principles, such as the Swedish National Data Service (SND), may get you a fair bit towards FAIR data.

F – Findable

The first step in re-using data is to actually find the data. Data should be easily findable for both humans and computers. Publishing data together with rich, machine-readable metadata (i.e., information that clearly describes the origin, content, and structure of the actual data) is essential for their discovery. For a listing and explanation of all Findable principles, please refer to the GO FAIR foundation.

Practical advice

  • Publish data in a data repository available on the internet. Choose a data repository that

    • provides a form that allows you to describe the data in depth with structured metadata (such as title, creator, origin, dates, description of data collection and methods, geotagging, keywords, etc.). The metadata you add then become machine-readable, which in turn allows others to discover them by using search engines.

    • provides the dataset with a unique and persistent identifier, such as the Digital Object Identifier (DOI), leading to a landing page describing the dataset. Many repositories assign DOIs automatically when datasets are submitted. Such identifiers allow for machine-readable links that remain valid and accessible over time unlike just any URL that tend to “link rot” with time. Use the persistent identifier when referring to the published dataset. More information about how to cite data can be found at Discover, reuse, and cite data.

    • is searchable via major search engines, such as Google and Bing. Many repositories are also indexed by more specialised services like Web of Science.

For legal or ethical reasons, it may not always be possible to publish data openly. In this case you may still be able to publish a description of the dataset (i.e. metadata), so that it is possible to discover, but with restricted access.

Further information about sharing and publishing data can be found at Share and publish data.

A – Accessible

Once a dataset has been discovered, it should then be possible to (find out how to) access. Note that FAIR does not necessarily mean ”open”: data may be accessible in the technical sense but may require authentication and authorisation for ethical and/or legal reasons. Yet, while data may not readily be downloadable, a description of the data can still be made publicly available together with well-defined terms for access. For a listing and explanation of all Accessible principles, please refer to the GO FAIR foundation.

Practical advice

  • Publish data in a data repository available on the internet (see Findable above). Most data repositories use standard internet protocols that are open, free and universal and that can allow for authentication and authorisation procedures if applicable.

    • If you can only make available a description of the data (i.e., metadata), for legal or ethical reasons, make sure to include information about how to request access. Many repositories have metadata elements that flag whether datasets are available for download or merely upon request (including elements with regard to contact details).

    • If datasets, for any reason, have to be removed, try to make sure the description of the data (i.e., metadata) remains in the repository so that information about the data is still findable and accessible. In such cases the metadata should be updated to provide an explanation as to why the dataset is no longer readily available.

Further information about data formats can be found at Collect, organise and store data/Organising and structuring data.

I – Interoperable

Both humans and computers need to be able to understand and process the data if they are to be reused and integrated with other data, applications, or workflows. Fundamental in this respect is the use of recognised standards – in short you need to structure and describe the data in a way that makes it universally understandable and with as little ambiguity as possible. For a listing and explanation of all Interoperable principles, please refer to the GO FAIR foundation.

Practical advice

  • Ensure data accessibility by using open, common, and machine-readable data formats so that data files can actually be opened using standard or open software.  

    • The Swedish National Data Service (SND) provides a list of suggested formats for text, spreadsheet data, audio, video files, etc. In practice, it may not always be possible to use formats that fulfil these criteria, especially at the data collection and/or data analysis stages. In such cases, it is often possible to migrate/convert the data to a more interoperable format at a later point.

    • Should this not be possible, include a documentation on what software is needed to be able to read the data (in some cases you may actually need to publish the actual software along with the data; e.g., open source code).

  • Interoperability highly depends on standards and interrelations. Whenever possible:

    • use recognised terms, including codes and identifiers, already from the start, that is when structuring and annotating data during data collection/generation on through analysis and processing, for instance the International System of Units (SI) for measurements. Adopt data organisation schemes and vocabularies/terminologies commonly used within your field of research, for instance EML (Ecological Metadata Language) standard and the AGROVOC thesaurus (see more information below).

    • use links to clarify definitions, enhance context and indicate relationships. This can be done by, for example, linking from data variable names via unique identifiers to term definitions in a standard vocabulary (see Maize example below),

  • Publish data in a repository as described under Findable above. Doing so will, in general, ensure some level of interoperability of at least the metadata as many data repositories apply standardised, translatable, and machine-readable metadata elements for dataset level description.

    • By publishing data in a FAIR supporting data repository, standard identifiers, codes, and organisation schemes may automatically be applied to some of the metadata elements, for instance ORCID iD for researchers, Research organisation registry (ROR) for organisations, the ISO 639-2 code for languages.

    • SND’s research data catalogue for instance, makes use of the DDI (Data Documentation Initiative) standard when providing general, contextual information about the dataset and the Swedish standard for classification of research topics, “SSIF” to specify what subjects the dataset relates to. In the SND research data catalogue you can also choose among profiles containing sets of elements and terms specific to certain research domains, for instance environmental and climate data.

The AGROVOC thesaurus is an example of a standard vocabulary commonly used in the field of agricultural research. It offers a structured collection of agricultural concepts, terms, definitions, and relationships that are used to unambiguously describe resources. Each term has a unique identifier that can be used to link to the term from a dataset or the metadata of a dataset to the definition of the term. “Maize” is an example of a term that has an identifier and is represented with definition, relations to associated terms, alternative expressions of the concept, and links to matching concepts in other vocabularies for interoperability. Agroportal lists a range of additional vocabularies within agronomy and related domains.

Other examples of community standards that can be used for data annotation are Gene Ontology representing biological concepts, and Dyntaxa for Swedish species information (developed by SLU). More information about data documentation, structure, ontologies, and file formats can be found at Collect, organise and store data. Our page Discover, reuse and cite data contains information about citing published datasets.

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

R – Reusable

The ultimate goal of the FAIR Data Principles is to prepare data for their optimal reuse. In order for data to be reusable, the data need to be findable, accessible, and interoperable. The reusable principles stress the importance of describing data such that both humans and computers can determine if and how the data can and may be reused. This requires a rich and standardised description of the content, structure and origin of data, as well as clear information on terms and conditions for reuse. For a listing and explanation of all Reusable principles, please refer to the GO FAIR foundation.

Practical advice

  • Provide rich documentation of the data on the project level, dataset level, and variable level (read about this on our page Collect, organise, and store data) for them to become interpretable and reusable by others. Additional things to keep in mind include, for instance:

    • If scripts have been used to gather, process, analyse, or present data it is recommended that these are attached and annotated in a way that makes them understandable and reusable for others (i.e., apply literate programming).

    • Try to follow data management standards and policies already established within your field of research and use data organisation schemes, concepts as well as terms that are commonly applied (see also Interoperability above) when documenting and annotating data. Fairsharing.org is a curated resource that lists different kinds of standards and policies, promoting the FAIR principle of meeting domain-relevant standards.

  • Publish data (see Findability above) with as few restrictions as possible regarding their reuse and redistribution. In Sweden, research data is generally not considered copyrightable which complicates the application of licenses such as Creative Commons. However, the FAIR principles require that terms for data reuse are clearly described. Further information about data and licencing can be found at Share and publish data.

More information about data documentation/metadata, structure, terminologies and file formats can be found at Collect, organise, and store data.

Facts:

To keep in mind

  • When aiming for data reusability, it is advisable to keep the FAIR Data Principles in mind right from the outset of a project. Thus, read our Introduction to data management and prepare for FAIR data in your data management plan.

  • The Swedish National Data Service (SND), which SLU is a part of, has established a data repository that supports many of the recommendations listed above. In the Re3data registry you may find other appropriate data repositories.

  • It is important to point out that there are different levels when it comes to FAIR data. The extent to which data can be FAIR varies with the type of data and between data from different research disciplines.

  • The following checklists can help you assess whether or not you are on the right track towards FAIR data and learn more about how the principles can be applied:

    • How FAIR are your data? – a short checklist to help you “think FAIR” when it comes to data.

    • FAIR assessment tool – by filling out a form you can get an idea of the FAIRness of the data in your project and what steps could be taken to increase FAIRness.

    • FAIR Aware – this tool can help you to better understand the FAIR Principles and how making data FAIR can increase the potential value and impact of research data

  • The Data Curation Unit (DCU) is SLU’s support function when it comes to research data management including making data more FAIR. We can help you identify appropriate data repositories and assist you in preparing data as well as metadata prior to publishing. We work closely with SND and review data submissions to the SND research data catalogue by SLU employees. Contact us at dcu@slu.se.

Page editor: dcu@slu.se