Introduction to data management

Last changed: 17 November 2020
Plan data management  (Flaticon)

“Good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process.” (Wilkinson et al., Nature 2016)

‘Research data’ and ‘research data management’

What is research data?

Research data is data collected, observed, generated, created or obtained as part of a research or environmental monitoring and assessment activity, and on which an argument, theory, hypothesis, conclusion or any another research output is based. Research data is the backbone of scientific discovery and technological innovation. It is considered the prime currency of science, the building material of research!

Research data may be categorised as either quantitative or qualitative. Research data may, further, be of either observational, experimental, simulatory or derived/compiled type, of either numerical, descriptive, aural or visual nature, and of either raw or analysed/processed kind. And thus, research data comes in a variety of formats or media. Lastly, research data can be digital or non-digital in form.

All of these (i.e., category, type, nature, kind, format, and form of research data) need to be taken into consideration when planning and preparing for the management of research data. Similarly, research records (e.g., correspondence, project files, grant applications, reports, lists, forms, etc.) may also be important to manage. For information on the distinction between research data and other research material, see the Archives, Information Governance, and Records unit’s web page on ’Management and preservation.

What is research data management?

Research data management (RDM) is simply put the sustainable and effective handling of research data and is an integral part of the research process. It constitutes the comprehensible organisation, documentation, storage, processing, archiving and preservation, as well as sharing of data as part of a research or environmental monitoring and assessment activity. It involves the everyday management of research data carried out for the data’s entire lifecycle: from the planning of a project, point of data collection and/or creation through to its analysis, archiving and preservation, as well as sharing. Note that the lifespan of the data is often longer than the project during which it was collected and/or created: data can be re-analysed or re-worked in follow-up projects and/or may be reused by other researchers.

The benefits of research data management

Why manage research data?

It is a valuable resource that will have taken much time, effort, and money to collect and/or generate. Well organised, documented, stored, archived, preserved, and shared data is invaluable in terms of research efficiency as well as research integrity.

There are many good reasons why research data management (RDM) is important. It can help to:

  • increase your research efficiency,

Good RDM will enable you to organise files and data for easy access and analysis, both on an individual as well as project level. By organising files and data in a structured way future data retrieval will be optimised and risk of data loss minimised.

  • improve your research integrity,

Accurate and complete research data is crucial for validating, evaluating, and reproducing research output.

  • enhance your research visibility,

Making data available boosts the visibility of your findings and ideally also increases the specific number of citations (see Piwowar and Vision (2013) for how open data can lead to citation benefits). Research data – if correctly formatted, described, and annotated – will have significant ongoing value and can continue to have impact long after the completion of a research project.

  • enable collaboration,

By facilitating sharing and reuse of data for future research, you could be creating opportunities for collaboration with other researchers. Then again, sharing well-managed research data and enabling others to use it will help prevent duplication of effort.

  • keep your research safe,

You can reduce the risk of data loss by keeping research data safe. The use of robust and appropriate data storage facilities will help to reduce the loss of data through accidents or neglect.

  • comply with current legislation and funder policies,

Good RDM will help you comply with funder research data expectations and policies. Many funders and a growing number of journals as well as publishers now require you to share the data at the end of a project or at the time of publishing the corresponding research results. An increasing number of funders also require that a data management plan (DMP) is in place for each project.

  • ensure data is FAIR,

FAIR in relation to data means that data is Findable, Accessible, Interoperable, and Reusable. The FAIR Guiding Principles for scientific data management and stewardship are a set of guidelines intended to enhance the usability of data with the ultimate goal to optimise its reuse. To achieve this, data should be provided with sufficient documentation and metadata, preferably in a machine-readable format, so that it can be replicated and/or combined in different settings.

  • demonstrate responsible practice.

By managing research data according to good practice and making it publicly available, you can show that the use of public resources to fund research is done responsibly. Good RDM improves the possibilities for validation of research results and strengthens research integrity.

Have a look at Florian Markowetz’s article on five selfish reasons why to manage research data according to good RDM practices.

The data lifecycle

The data lifecycle provides a model of the steps involved in successful and sustainable management of data for use and reuse as a basis for discussion of the processes involved. Data does not necessarily pass through all the phases described in Fig. 1, yet the phases show logical dependencies. Also, the data lifecycle is not necessarily linear and what is presented here is simply one of many models.

  1. Planning data management
    Research data management (RDM) refers to good practice in planning, collecting, organising, storing, processing, archiving, preserving, and sharing the data collected, generated and/or acquired in any research project. To help you understand your needs and requirements, it is often very helpful to draw up a data management plan (DMP) and to do so right from the start of a project.

  2. Collecting, organising, and storing data
    Once you are done planning, it is time to collect, generate, and/or acquire data. And, this can quickly become disorganised. Thus, organising, structuring, and documenting data systematically from the very beginning is fundamental to good data management. It saves both time and energy as well as impacts the data’s quality. Lastly, it is essential to take on adequate security and protection measures when storing the data.

  3. Processing and analysing data
    Now that the data has been collected, generated, and/or acquired, you will be eager to start processing and analysing it. However, before being able to really analyse the “raw” data, it needs to be properly and accurately prepared or processed. While processing and analysing the data, it is ever more important to continue to organise, structure, and document the data. When it comes to documenting your work, the focus of your attention should also lie on recording and documenting your workflow (i.e., the changes made to the data during processing and the analyses run during data analysis).

  4. Archiving and preserving data
    In Sweden, all material produced as a result of research activities must be archived and preserved in order to ensure the right of access to public records, cultural heritage, and research needs. Yet, preserving data implies more than just saving, backing up, and depositing it. A strong focus has to be put on preserving data containing personal or sensitive information.

  5. Sharing and publishing data
    Having followed the efforts mentioned in the previous phases, sharing data becomes quite simple. There are many ways to share data. Greater impact and visibility are achieved by means of publishing data. One of the most common strategy regarding data publication is to deposit the data at a data repository.

  6. Discovering, reusing, and citing data
    In an effort to minimise unnecessary costs, funders nowadays require researchers to check whether or not the data they intend to collect and/or generate already exists. Should you have found already existing data (also known as secondary or third-party data) that meets your purpose, you need to make sure the licence applied by the author(s) is suitable for your purposes and to respect the authors’ intellectual property. Finally, having reused someone else’s data, it needs to be properly cited and any newly generated data using the original author(s) work licenced accordingly.

Fig. 1. Data lifecycle SLU

Data life cycle

Data life cycle CC BY SLU Data Curation Unit. All icons in the life cycle and on the pages are made by Prosymbols from www.flaticon.com.

Collect, organise, and store data Process and analyse data Archive and preserve data Share and publish data Discover, reuse, and cite data Introduction to data management

Page editor: dcu@slu.se