Archive and preserve data
Archiving and preserving data implies more than just saving, backing up, or depositing it. It includes a series of managed activities aimed to ensure continued access. Here, you will find guidance on why data needs to be archived and preserved, what kind of data has to be archived and preserved, and how to preserve and maintain data in a secure environment for it to remain accessible, understandable, and usable in the long run.
Why archive and preserve data
The main reason for archiving and preserving data is to ensure access to it – now and in the future. In Sweden, it is first and foremost the Archives Act (SFS 1990:782; information available only in Swedish) that needs to be adhered to when it comes to archiving and preserving data: it stipulates that – as a rule – all material produced as a result of research activities must be archived and preserved in order to ensure the rights of access to public records, cultural heritage, and research needs. Complying with the Freedom of the Press Act (SFS 1949:105) and the Public Access to Information and Secrecy Act (SFS 2009:400) as well as more and more journal and funder policies, yet, also involves preserving data (however, as consequence of compliance rather than a direct requirement). In any case, abiding by these laws and policies means that data is systematically organised and stored (i.e., managed and kept in good order), accessible, available, understandable, and usable in the long run.
In addition, archiving and preserving data appropriately increases credibility, allows for reproducibility and validation, and is an important component of making data FAIR. More incentives for why to archive and preserve data can be found on cessda’s (Consortium of European Social Science Data Archives) web page Towards archiving & publishing.
What data to archive and preserve
Archiving and preserving data implies that you will need to carefully consider which components of your research need to be kept and which ones not in order to comply with national legislation, satisfy external funders and publishers as well as your institution, and meet your own purposes. The process of selecting data that is to be either kept or disposed of is called data appraisal. At SLU, data appraisal has to be done in accordance with the regulations devised by the Swedish National Archives (as described in RA-FS 1991:1 and RA-MS 2013:7). In-depth information on how and when to appraise research material (such as data) can be found on the Archive, information management and registry unit’s web page on Management and preservation.
A highly important aspect of data appraisal is that should data be allowed to be disposed of it has to be done in a manner that the information cannot be recovered. This is of special importance should such data contain information classified as personal or sensitive (regulated according to the Public Access to Information and Secrecy Act [SFS 2009:400]). In any case, merely deleting material from a file system is not enough, which is why other methods need to be considered to ensure the material cannot be recovered. Contact your department’s IT coordinator or SLU’s IT department for further information on how to securely dispose of information. Please note that because complete disposal of files and data stored in cloud services may be difficult, we recommend not to store data on such media.
How to preserve data
When you prepare data to be preserved, you need to ensure that it is kept secure AND that it can be accessed, read, and understood in the future. Active steps to preserve data include good file management, adequate documentation, data security and protection, as well as storage.
Data preservation and file management
An important aspect of data archiving and preservation is file management (i.e., folder structure, file and folder naming convention, file versioning, and choice of file format), in particular the choice of file format. Digital file formats tend to become obsolete over time, meaning that while the bits in a file may still be intact its information cannot be accessed and used. Choosing file formats that follow open standards with publicly available specifications, are non-proprietary, free of encryption and copy protection, commonly used, as well as lossless should ensure long-term preservation. Advice on and guidance about file management including choice of file format is given at Collect, organise, and store data. With regard to choosing the right file format for long-term preservation and usability of the data in mind, both the Swedish National Archives and the Swedish National Data Service (SND) provide recommendations.
Data preservation and documentation
For data to be understandable and usable/reusable in the future, it needs to be archived and preserved with its appropriate metadata. High-quality metadata provides contextual information about the collected, generated, and/or acquired data itself as well as about the processes and analyses performed on the data. Also, metadata is often a prerequisite for making data searchable. In-detail information on how to document data and its processing and analysis can be found at Collect, organise, and store data and Process and analyse data, respectively.
Data preservation and data security/protection
In order for data to be reusable, it has to be archived and preserved safely alongside its documentation and metadata.
First, you must ensure the data’s security by ensuring that only authorised people can access the data to read, edit, and use it. Doing so should mean that data plus its metadata will be safe from unauthorised access and use (e.g., manipulation, change, destruction).
Second, should the data that is to be archived and preserved contain personal or sensitive information, additional measures need to be kept in mind to ensure protection if needed (see Collect, organise, and store data as well as Process and analyse data for more information in this respect).
Data preservation and storage
At SLU, you can either store data for archiving and preservation on an external repository, or SLU’s archive (i.e., an internal local server). However, no central solution currently exists at SLU to store larger quantities of data, which is why such big data need to be stored locally (e.g., internal server) or externally (e.g., at a repository or database; note that this cannot be regarded as archiving on SLU’s behalf should no contract exist clarifying this). In any case, note that data collected, generated, acquired, processed, and analysed as part of a research activity carried out at SLU is part of the SLU archive no matter where it is stored and is, as such, subject to national legislation regarding the handling of official documents. More general information about storing data can be found at Collect, organise, and store data, while advice on how to store and preserve data in (external) repositories is given at Share and publish data. Finally, the Archive, information management and registry unit at SLU can guide you on how to archive and preserve research material at SLU’s central archive.
Best practice with regard to making research data findable and accessible is to submit it to a repository. These should collect and display data alongside related documentation and metadata. Find out more about how to make data findable and accessible at Share and publish data.
It is important to point out that for data to remain usable in the future, it needs to be actively managed during preservation (together with its metadata) (note that external data repositories do not usually attend to this). Digital sources, unfortunately, may degrade over time (‘bit-rot’), which is why they have to be checked on a regular basis to make certain there is no deterioration (e.g., by creating a checksum). Moreover and as mentioned above, file formats may become obsolete over time. To counteract that risk of obsolescence, files and data need to be migrated to alternative formats (make sure to document the characteristics of such file migrations). However, following all these recommendations will ensure that the data you intend to preserve will be findable and accessible, as well as interoperable and reusable in the future; that is the data should have all the characteristics of being FAIR.