This page provides more information on what data management involves and why it is important. You will also find links to further reading, guides, tools and tips on how to improve your data management.
What is data management?
You probably handle data every day without considering it as 'data management'. It encompasses how data from research and environmental monitoring and assessment is handled throughout a project.
Good data management — meaning data management that complies with laws, principles, and best practice — is not an end in itself. Rather, it is a strategy for making research, as well as environmental monitoring and assessment, more efficient. Good data management also is also a way of disseminating results so that they can be put to better use. It is also an important part of good research practice.
Good data management:
enables researchers and environmental assessment staff to use their time more efficiently
makes research as well as environmental monitoring and assessment more transparent
facilitates the validation of research
ensures that files and data are stored in a way that makes them easy to find
reduces the risks of data loss and data breaches
enables others to reuse data
facilitates collaboration through data sharing
demonstrates that public funds are being used responsibly.
Research and EMA (environmental monitoring and assessment) data are data that are collected or generated in accordance with scientific principles for use in various types of scientific analysis. Both types of data come in various forms, such as:
numerical data, such as measurement results
text, such as interview and survey responses
image and audio material
source code
maps and other forms of geographical data
observations, such as species occurrences.
The data lifecycle – from planning to long-term data preservation
It is common to emphasise that good data management is important throughout the data lifecycle. The data lifecycle (see the above image) is a conceptual model that illustrates the various phases of data management, from planning and collection, through processing and storage, to publication and long-term preservation.
Read more about the different aspects of data management during these phases below.
Planning data management
Planning data management helps you identify a project’s requirements, whether technical, legal or ethical. Before a project begins or at the start of the project, it is important to consider data management issues, such as:
What data will the project use?
Will personal or other sensitive data be collected?
How will the data be stored, and who will have access to it during the project?
How will the data be made available?
A data management plan is a tool that helps you to identify and address relevant issues in a structured manner.
A data management plan (DMP) is a key component of data management planning. This document describes how data will be managed during and after a research project, environmental assessment programme or within a research infrastructure. Having a DMP in place makes it easier to anticipate potential problems before they arise, providing the best possible conditions for finding a solution that suits you and your project.
A DMP is a living document that should be updated on an ongoing basis.
According to SLU’s data management policy, data management plans are mandatory for all new projects started at the university in September 2022 or later.
Allt fler finansiärer (t ex VR, Formas, Naturvårdsverket, Forte, Riksbankens jubileumsfond, EU) kräver att projekten de finansierar tar fram, underhåller och följer en datahanteringsplan. Även om inte finansiären kräver en datahanteringsplan är det ett krav från SLU:s policy.
Kolla med din finansiär för att ta redo på vilka krav som gäller för datahanteringsplaneringen, eller kontakta SLU:s datahanteringsstöd (dms@slu.se).
Organise data
Data can quickly become disorganised, whether you’re working alone or with others. That’s why it’s a good idea to have an organised approach to naming and storing files. Having a system for managing different versions of the same file is also beneficial, as this allows you to track changes and correct errors more easily.
Decide on a system for organising and naming files and folders to save time and avoid mistakes. A logical and consistent system will help you to find the right files quickly and easily.
It is a good idea to document your file management system, especially when collaborating with others. You could place a description of the folder structure and the naming conventions for files and folders in a supporting README file in the project’s root folder, where everyone involved can easily find it.
Datasets and files often change, and it is important to keep track of these changes. Saving multiple versions of files and being able to easily access them makes it easier to track changes and correct errors. Versioning can be managed using file names, tables or software add-ons for analysis programmes. Another way to create a system for accessing different versions of a dataset is to make changes in a programme that uses code or scripts.
You can read more about documenting data further down this page.
Data must be stored securely! To prevent data loss and unauthorised access, you should use a secure storage solution that includes backup. It is important to choose a solution that meets your requirements based on the nature of the data to be stored, such as whether it contains personal or other sensitive information.
When choosing a storage solution, there are a few key considerations:
How much storage capacity does the project require?
Is the data active, or will it be stored long term?
Who needs access to the data?
Does the data contain any sensitive information, such as personal data or details about the occurrence of protected species?
Metadata and documentation are required to ensure that data can be validated, understood, and reused. Documentation refers to descriptions of data intended primarily for human reading (and therefore consisting of continuous text). Metadata is also a form of documentation, but it must be structured so that it can be read by both humans and computers.
The relevant documentation varies depending on the research subject and methods. The important thing is to document data in a way that makes it understandable and reusable. Various scientific disciplines have standards that specify what should be documented (read more about metadata standards further down the page).
Things that are always important to document include:
How the data was collected (including everything from the sampling equipment and measuring instruments used, to the wording of questionnaires and interview questions).
Where the data was collected
When the data was collected
What codes and abbreviations mean
What restrictions (e.g. ethical or legal) limit how the data can be reused.
There are various ways to document data. The most suitable tool for this documentation depends on the scientific discipline and the format in which the data is stored. Different tools may also be appropriate at different stages. Here are some examples of approaches:
One option is to use a separate supplementary document, such as a text file named 'readme.txt' and saved in the same location as the data files.
Some file formats support embedded metadata.
When publishing data in a research data repository, it is important to check the repository’s documentation and metadata requirements.
Documentation should not only cover the final data, but also collection methods, processing, analyses and data handling. This will facilitate both writing articles and describing and making the final datasets available. There are many useful tools and resources for documenting processes and workflows, such as:
Electronic laboratory notebooks
Markdown (e.g. R Markdown in RStudio)
Jupyter Notebook
Documentation and metadata should be openly published. It is also important that the documentation is clearly linked to the data. This can be achieved by publishing the documentation alongside the data or by interlinking the two.
If the documentation is published separately, it should be given a persistent identifier to help users find and refer to it. The most appropriate way to make documentation and metadata available depends on their format. Options include SLU’s publication database, SLU’s e-archive and external repositories.
SLU’s publication database
SLU’s publication database is intended for publications issued by SLU. It is suitable for manuals, instructions and reports.
SLU’s e-archive can be used to assign persistent identifiers to documents and make them openly accessible. It is suitable for large historical document collections and documents that cannot be published in SLU’s publication database. To archive a document, please contact Air – The Unit for Archives, Information Governance and Records (arkiv@slu.se).
Please note that documentation of data produced at SLU is considered a public document and must be archived by law. SLU’s e-archive is suitable for this purpose. The same applies to documentation published in repositories and publications registered in SLU’s publication database. Procedures for the automatic archiving of publications from the publication database are currently under development.
Repositories
If data is published in a repository, its associated documentation and metadata should also be published there. They may also be published in an external repository, such as Zenodo. This provides a quick way to give documents a persistent identifier when no other channel is available. However, please note that publications issued by SLU must be included in SLU’s publication database, and publication in Zenodo does not satisfy the legal archiving requirements.
Different research disciplines have different metadata standards. These standards contain specifications and guidelines on how data should be described, which metadata elements should be included and how variables should be named, among other things. These standards facilitate the harmonisation of data and can serve as a guide for producing comprehensive metadata descriptions.
Examples include the Darwin Core standard for biodiversity data and the more general DDI Lifecycle standard used in the Swedish National Data Service’s repository, where you can select more specific metadata profiles such as natural sciences and agricultural sciences.
You can read more about various metadata standards on the FAIRsharing website.
As a Swedish public authority, SLU is required to archive public documents, including research data. The purpose of archiving is twofold: to maintain order in public documents and to provide the public with access to them in accordance with the principle of public access (unless confidentiality applies; see 'The principle of public access' under 'Data processing on a legal basis' further down this page).
Each department is responsible for ensuring that public documents are properly managed (registered, described, and stored securely) through its head of department. Day-to-day work is carried out by the person in the department responsible for registration and archiving (the RA role). The head of department can inform you who this person is.
Every member of staff is responsible for ensuring archiving is possible. Anyone who handles applications, contracts, agreements, procurement, research data, reports or articles in the course of their duties must ensure that these can be registered and archived.
SLU has local archives at some departments and faculties, but research data must be archived in the university’s central e-archive. While the RA at your department can help you register your data, the transfer to the central e-archive also requires the involvement of SLU’s Unit for Archives, Information Governance and Records. They can also advise you on preserving research material, including how to describe it and which file formats to use.
Research data is not the only thing that should be archived within a research project. Applications, contracts, project budgets, data management plans, reports and articles should also be archived.
In order to ensure that files can still be read in the future, it is important to choose a suitable file format. Ideally, this should be based on an open standard, be independent of any specific software and be openly documented.
Read more about file formats for long-term preservation at Researchdata.se. File Formats.
Publishing and making data available
Making data openly available is an effective way to disseminate the results of research and environmental analysis. It allows other researchers, public authorities and businesses to build on existing studies, rather than starting from scratch.
Open data also increases the visibility of related publications. Studies show that scientific articles that link to openly available datasets are often cited more frequently than other articles.
Sharing data openly enhances the transparency of research, making it easier to review, reproduce and further develop the results.
A research data repository is a platform that stores and makes data available in a structured, searchable, reliable and long-term sustainable manner. These repositories may be discipline-, domain- or institution-specific, or more general in nature.
Wherever possible, data should be made available in repositories that provide persistent identifiers for data (e.g. a Digital Object Identifier, or DOI). Ideally, a certified repository should be used.
Choosing a data repository
The most suitable data repository depends on the type of data to be published.
SLU is involved in running the Swedish National Data Service (SND) research infrastructure, which offers a data repository that meets legal requirements and those of research funders, for example. The repository can handle sensitive data, including personal data, as SLU has a data processing agreement with SND.
Data published in the SND repository is made visible through the national research data portal, Researchdata.se, and other data catalogues and search services. The SND repository can be used for many types of data and research subjects, but it also has subject-specific features that support various subject profiles.
SLU's data management support service can help you publish via SND.
For data from certain research disciplines (such as bioinformatics), a subject-specific repository may be recommended.
The re3data.org website lists repositories where you can share and find research data.
The researchdata.se website provides a guide to choosing one of the repositories that make data visible on researchdata.se: Sharing data: A quick guide
When selecting a repository:
Check your funder's and journal's recommendations and requirements regarding data sharing
Where possible, use a repository that provides data with a permanent identifier, as this makes the data easier to reference, find and reuse.
Data papers are articles published in peer-reviewed scientific journals that describe datasets and the methods used to collect the data. However, the actual data is often stored in a repository (see above) and linked to from the data paper.
Can all data be published openly?
Data cannot be published openly if it contains any of the following:
Confidential information, such as sensitive personal data
Material protected by copyright, unless permission has been granted
Trade secrets
However, it is still possible to publish metadata, and this is a requirement under SLU’s policy. In many cases, it may also be possible to publish parts of the data openly.
Remember that even data which cannot be published openly must be archived, and that any request for a dataset will be reviewed.
Licences and markings
Clear terms and conditions that govern how data can be used are important for facilitating its reuse and are a key part of the FAIR principles. Data itself is not protected by copyright, and licences can be difficult to apply to datasets because they presuppose the existence of such rights. Therefore, it may be better to use a marker instead.
You can find guidance and recommendations on licensing when publishing data in our FAQ:
Many research funding bodies now require the results and underlying data from funded projects to be made openly available. Scientific journals are also increasingly requiring that the data underpinning a publication be made available in this way. In the case of SLU’s environmental monitoring and assessment, there may also be legal obligations to share data.
Sweden and other EU countries have decided that data from publicly funded research should be published as openly as possible, subject to restrictions where necessary for legal, ethical, security or commercial reasons.
Read more about SLU’s data management policy and the guidelines and principles we follow in the ‘Guidelines and guiding principles’ section further down the page.
Reuse data
Reusing data can save time and resources while enabling the validation of studies. It also makes it possible to integrate datasets from different studies and disciplines, opening up opportunities for new collaborations.
There are many ways to discover and search for data, including through colleagues in the research field, scientific articles and literature databases, general search engines and various data repositories and portals.
Consider whether the data you have found is actually useful for your purposes. Evaluate the data based on its quality and reliability. Ask yourself: is the source reliable? Are established standards used? Is the data sufficiently documented in terms of when and how it was collected and processed? Check the terms of use and distribution, and ensure that you have obtained all the necessary permissions and consents.
Give credit to the people who collected and made the data available by citing secondary data in the same way as academic articles. The SLU University Library provides guidance on citing data in accordance with SLU’s Harvard style.
You can find more detailed information on Researchdata.se: Reuse and cite data
Data processing in accordance with the law
A wide range of laws, regulations and other rules apply to data management at SLU. These include the Freedom of the Press Act, the Public Access to Information and Secrecy Act, the General Data Protection Regulation, the Archives Act and the Open Data Directive. Familiarising yourself with the basic provisions is important for managing scientific data from research and environmental analysis responsibly.
As a public authority, SLU is subject to the principle of public access to official records. This means that data from research as well as from environmental monitoring and assessment usually are public documents and are disclosed upon request. The only exception is where grounds for confidentiality exist under the Public Access to Information and Secrecy Act. Anyone requesting public documents has the right to have their request reviewed. SLU’s legal team determines whether such grounds exist. One such ground may be that the data contains sensitive personal information.
Scientific data is one of SLU’s most valuable assets, and it is crucial that we protect it. Therefore, when handling data, it is essential to consider information security. Information from SLU must be accurate and accessible to authorised users, while remaining hidden from unauthorised individuals. While scientific data does not always need to be kept confidential, we must protect it to prevent data loss and breaches.
The level of protection required depends on various factors, including what is needed to restore or reconstruct a dataset. As research data may be sensitive for various reasons and to varying degrees, the need for protection may change during the course of the work, when the data is made available and subsequently preserved and archived. For example, legal or ethical reasons may prevent data from being made openly available. This could be because the data is subject to copyright (if it contains photographs, for instance), or because it requires confidentiality (for example, to protect endangered species, sensitive personal data, or trade secrets). While it is in active use, such information may need to be kept more secure, but it should be archived in the same way as other research data, with a clear description.
The first step in protecting scientific data is to carry out an information classification. Read more on the staff web (login required): Information security
Find out more about sensitive data at Researchdata.se: Protected data
Personal data refers to information relating to an identified or identifiable natural person. Research and environmental monitoring and assessment data often contains personal data.
However, this is not a problem, as SLU is permitted to collect, use and archive personal data, provided that we process it correctly and have a legal basis for doing so. Special requirements apply to the processing of sensitive personal data, such as information about a person’s political views. For example, an ethical review is required before sensitive personal data can be used in research.
Find out more about personal data in research data:
SLU’s data management policy aims to improve the quality, dissemination, impact and innovative potential of the university’s research and environmental monitoring and assessment. It sets out principles for data management at SLU, including those relating to storage, publication and access. The policy also emphasises the importance of good data management in research and environmental monitroing and assessment.
The data management policy is based, among other things, on legal requirements and Sweden’s national guidelines for open science.
For example, the policy states that data from research and environmental monitoring and assessment should be made as open as possible, that the FAIR principles should be adhered to, and that a data management plan should be drawn up for new projects.
This policy applies to all types of digital data produced or processed during research or environmental monitoring and assessment at SLU.
SLU is an advocate of open science, as reflected in its data and publication policies, among other things.
SLU is also a member of the Swedish National Data Service (SND), a national infrastructure which helps researchers make all types of digital research data available, including via the portal Researchdata.se.
FAIR is an acronym for Findable, Accessible, Interoperable and Reusable. The FAIR principles are intended to serve as guidelines for improving the reusability of scientific data and play a key role in the transition to open science.
Originally published in The FAIR Guiding Principles for Scientific Data Management and Stewardship (Wilkinson et al 2016), the principles have since received broad support from research groups, governments, funders, and publishers.
SLU's data management policy states that research and environmental monitoring and assessment data at SLU should be FAIR to the greatest extent possible. This introductory guide and other data management guidelines are based on the FAIR principles where relevant and help researchers and environmental assessment staff work in accordance with them. We also have a guide offering tips directly related to the FAIR principles:
SLU's support team assists staff with data management in research and environmental monitoring and assessment. Our team has expertise in various aspects of data management, as well as in academic publishing, information management, archiving, IT, law, information security, research, environmental assessment and research funding.