Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 18 additions & 15 deletions docs/30_data/30_data_organisation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ Find a balanced set of elements: Too many make it difficult to grasp quickly, wh

:::note General basics for naming files:

- Order the elements from general to specific.
- Use meaningful abbreviations instead of long identifiers.
- Use underscore `_`, hyphen `-` or capitalized letters to separate elements in the name. Don’t use spaces or special characters: `?!&,_%#;_()@$^~‘{}[]<>`.
- Use date format ISO8601: `YYYYMMDD`, and time if needed `HHMMSS`.
- Include a version number if appropriate: minimum two digits (V02) and extend it, if needed for minor corrections (V02-03). The leading zeros, will ensure the files are sorted correctly.
- Order the elements from general to specific.
- Use meaningful abbreviations instead of long identifiers.
- Use underscore `_`, hyphen `-` or capitalized letters to separate elements in the name. Don’t use spaces or special characters: `?!&,%#;()@$^~‘{}[]<>`.
- Use date format ISO8601: `YYYYMMDD`, and time if needed `HHMMSS`.
- Include a version number if appropriate: minimum two digits (V02) and extend it, if needed for minor corrections (V02-03). The leading zeros, will ensure the files are sorted correctly.

(by [RDMKit](https://rdmkit.elixir-europe.org/data_organisation.html#what-is-the-best-way-to-name-a-file))
:::
Expand All @@ -55,15 +55,16 @@ A good file name such as `20180211_ELI5_TEMP_BH01_RAW_03.csv` can easily be sort
- **type of data:** RAW = raw data from measuring device
- **number of file:** containing data for that measurement series

If you need to rename a multiple files, take a look at:
If you need to rename multiple files, take a look at:

- [Thunar Bulk Rename](https://docs.xfce.org/xfce/thunar/bulk-renamer/start) (Linux, GUI)
- [command line: mv, mmv, rename](https://linuxconfig.org/how-to-rename-multiple-files-on-linux) (Linux, CLI)
- [Bulk Rename Utility](https://www.bulkrenameutility.co.uk/) (Windows, free)
- [TotalCommander](https://www.ghisler.com/advanced.htm#tutorial_rename) (windows, Shareware)
- [Renamer4Mac](https://renamer.com/) (Mac).
- [Thunar Bulk Rename](https://docs.xfce.org/xfce/thunar/bulk-renamer/start) (Linux, GUI)
- [command line: mv, mmv, rename](https://linuxconfig.org/how-to-rename-multiple-files-on-linux) (Linux, CLI)
- [Bulk Rename Utility](https://www.bulkrenameutility.co.uk/) (Windows, free)
- [A.F.5 Rename your files](http://fauland.com/download.htm) (Windows, free)
- [TotalCommander](https://www.ghisler.com/advanced.htm#tutorial_rename) (Windows, Shareware)
- [Renamer4Mac](https://renamer.com/) (Mac).

For some special file formats there are tools for adapting the file name to metadata. For example, to create a file name that fits your scheme and takes date and time information from the EXIF data of a jpg file. Some also allow adding an offset - this helps sort photos into timestamps that run on different clocks.
For some special file formats there are tools for adapting the file name to the metadata. For example, to create a file name that fits your scheme and takes date and time information from the EXIF data of a jpg file. Some also allow adding an offset - this helps sort photos into timestamps that run on different clocks.

## Files: versioning

Expand All @@ -77,18 +78,18 @@ Stash snapshots or simply track changes and allow to find something that existed

## Files: types of metadata

Consider the way data and [Metadata](/docs/metadata/) can be stored together as FDOs (Fair Digital Objects). For example, metadata can be divided into the following four categories:
Consider the way data and [Metadata](/docs/metadata) can be stored together as FDOs (Fair Digital Objects). For example, metadata can be divided into the following four categories:

- descriptive metadata
- administrative metadata
- technical metadata
- structural metadata

An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Format Standard](/docs/data_formats/) for more information.
An FDO encapsulates data and metadata in one file and can be saved as an [HDF5](https://www.hdfgroup.org/solutions/hdf5/), for example. See [Data Formats](/docs/data_formats/) for more information.

## Files: formats

Different disciplines use established standards, see [Data Format Standard](/docs/data_formats/). Also consider beyond the duration of the project:
Different disciplines use established standards, see [Data Formats](/docs/data_formats/). Also consider beyond the duration of the project:

- usage of proprietary or open file formats
- exchange within and outside of the working group
Expand All @@ -111,6 +112,7 @@ The top folder should have a README.txt file describing the folder structure and

#### An example by [RDMKit](https://rdmkit.elixir-europe.org/data_organisation.html#what-is-the-best-way-to-name-a-file):

```
project/
code/ code needed to go from input files to final results
data/ raw and primary data (never edit!)
Expand All @@ -127,6 +129,7 @@ The top folder should have a README.txt file describing the folder structure and
tables/
scratch/ temporary files that can safely be deleted or lost
README.txt file and folder description
```

## Sources and further information

Expand Down
30 changes: 15 additions & 15 deletions docs/30_data/40_data_documentation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,31 +25,31 @@ _Andreas von der Dunk, Technische Universität Dresden, Service Center Research

A clean and comprehensible organisation of data and documents are an important part of good research practice and an important step to realise research data management according to the [FAIR data principles](/docs/fair).

An essential task is to plan the organisation and storage of data, [metadata](/docs/metadata), and documents in advance and to document the relevant measures.
An essential task is defining in advance the organisation and storage of data, documents, and their [metadata](/docs/metadata), and documenting the relevant measures.

Central requirements are the definition of [formal responsibilities, organisational conventions](#formal-responsibilities-and-organisational-conventions) and [technical implementations](#technical-implementations) to organise the data and meta information produced. The information collected for this purpose is recorded in the [Data Management Plan](/docs/dmp). Note that a good data organisation also estimates the costs (see [costing tool and checklist](https://ukdataservice.ac.uk/learning-hub/research-data-management/plan-to-share/costing/) for example) in the early application phase.

### Formal responsibilities and organisational conventions

To make a long story short: It's mainly about describing who will and how to work with documents. First steps could be about:
An essential part of the data documentation is the definition of the actors working with the data and the procedures. These are possible first steps:

- Documenting the responsibilities of primary researchers and project staff.
- Creating user roles: Define detailed rights for users / groups / roles to access data and sensitive information.
- Creating user roles: Defining detailed rights for users / groups / roles to access data and sensitive information.
- Describing processes of quality assurance including protected storage, sharing, and accessibility in the short term and on the long run.
- Data processing: How, where, how fast. Describe input and output data. Decide how you will name and structure files and folders.
- Data processing: How, where, how fast. Description of input and output data; decisions on naming and structuring conventions for files and folders.

The result should be descriptive documents that unambiguously define for files which are used in the course of the daily work routines:
The result should be a set of descriptive documents associated with the files used during the daily work routines, which unambiguously determine:

- on which status (for example original file, temporary work file; Draft, intermediate version, final version),
- where (workstation PC, central file server, database),
- for how long (temporarily, project duration, long-term availability),
- and in which format they are saved.
- the status (for example original file, temporary work file; Draft, intermediate version, final version),
- the location (workstation PC, central file server, database),
- the availability time frame (short, project length, long-term),
- the format in which they are saved.

![small data handout](/img/data/en_data_handout.png)

### Technical implementations

Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation) including [Metadata](/docs/metadata/) and suitable [Data Format Standards](/docs/data_formats/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PID](/docs/pid/).
Get an overview of the occurring data and document flows. A short description, easy to understand for every user, should be accessible on a low level and explain the main concepts. The data itself needs to have a bulletproof [Data Organisation](/docs/data_organisation), including [Metadata](/docs/metadata/) and suitable [Data Formats](/docs/data_formats/). Find out more about the usual [Best Practice](/docs/best_practice/) at your institute or within your discipline. Plan how documents can be shared between project staff and what needs to be accessed for [Data Publication](/docs/data_publishing/) with [PIDs](/docs/pid/).

Data security affects all technical and organisational issues to protect the data from alteration, loss, and destruction. In this context, storage methods, backup procedures, necessary physical resources as well as automated and administrative routines must be planned and put in place. Ask local contacts or external experts about already established technologies for [Data Storage and Archiving](/docs/data_storage) as well as suitable [Repositories](/docs/repositories/).

Expand All @@ -70,11 +70,11 @@ Good data documentation does not happen over night - take small steps first. The
- Which devices or file formats are or have been used?
- Are there any special features?
- Awareness: Who produces (meta)data, and who continues to use data and how?
- Define internal rules and processes: What are the targets of RDM, and how can it be achieved?
- Apply and evaluate rules, iteratively: Learn, set, follow, repeat. Keep it simple and smart (KISS).
- Develop a suitable technology: Determine specific requirements in the first project phase and continuously adapt them to changing conditions.
- Establish supporting technology: Evaluate and test software like [ELN](/docs/eln/) and [Repositories](/docs/repositories/), train staff.
- Obtain legal advice, include local and higher-level policies and procedures: Contact legal department at your institution or [NFDI Querschnittssektion Ethik und Recht](https://www.nfdi.de/einrichtung-von-ersten-sektionen/)
- Define internal rules and processes: What are the targets of RDM, and how can they be achieved?
- Apply and evaluate rules iteratively: Learn, set, follow, repeat. Keep it simple and smart (KISS).
- Develop suitable technology: Determine specific requirements in the first project phase and adapt them continuously to changing conditions.
- Establish supporting technology: Evaluate and test software like [ELN](/docs/eln/) and [Repositories](/docs/repositories/); train your staff.
- Obtain legal advice, considering local and higher-level policies and procedures: Contact the legal department at your institution or [NFDI Querschnittssektion "Ethik und Recht"](https://www.nfdi.de/einrichtung-von-ersten-sektionen/)
- Make rules and decisions accessible to everyone at an early stage, for example in the form of a short handout.
- Check the concept regularly and update it if necessary.

Expand Down
26 changes: 13 additions & 13 deletions docs/30_data/50_data_storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,25 @@ nfdi4chem-tags: [data_organisation, data_storage, repositories]
slug: "/data_storage"
---

If you plan to collect data and process it into information, you should consider different types of storage with regard to security, backup, access time and sharing with others. It is also of interest [how to estimate the computational resources for data processing and analysis](https://rdmkit.elixir-europe.org/storage.html#how-do-you-estimate-computational-resources-for-data-processing-and-analysis). There are different requirements for the entire [Data Life Cycle](/docs/data_life_cycle/). Regarding the workflows used in a project, care should also be taken when securing these workflows and tools (software version!) to ensure the reproducibility of results.
If you plan to collect data and process it into information, you should consider different types of storage with regard to security, backup, access time and sharing with others. It is also of interest [to estimate the computational resources for data processing and analysis](https://rdmkit.elixir-europe.org/storage.html#how-do-you-estimate-computational-resources-for-data-processing-and-analysis). There are different requirements for the entire [Data Life Cycle](/docs/data_life_cycle/). Regarding the workflows used in a project, care should also be taken when securing these workflows and tools (software version!) to ensure the reproducibility of results.

## Workflow perspective
Let's discuss different storage solutions along a possible workflow. Think of all possible data sources that provide data in your project, such as laboratory equipment (devices), manually collected data or external data from publications or project partners. Some devices may continuously automatically deliver data points while others regularly provide files for collection. Reduce the amount to the data points necessary for your project, consider possible pre-processing and estimate the data that will arise in terms of frequency and size. It is possible that data can already be processed while other data of the same type is still being recorded. At what point in the workflow is the data annotated by further metadata and does this possibly also work automatically? What descriptive documents are provided by human sources and when?
Let's discuss different storage solutions along a possible workflow. Think of all possible data sources that provide data in your project, such as laboratory equipment (devices), manually collected data or external data from publications or project partners. Some devices may continuously automatically deliver data points, while others regularly provide files for collection. Reduce the amount to the data points necessary for your project, consider possible pre-processing and estimate the data that will arise in terms of frequency and size. It is possible that a part of the data has already been processed, while other data of the same type is still being recorded. At what point in the workflow is the data annotated by further metadata, and does this possibly also work automatically? What descriptive documents are provided by human sources and when?

When planning [data management](/docs/dmp/), think about storage solutions and request short-term and long-term storage in advance.
In the [planning phase](/docs/dmp/) of a research activity, think about storage solutions and request short-term and long-term storage in advance.

#### Necessary requirements when designing a storage system:
- space requirements for collection or generation of raw data including temporary files ("fast storage")
- space requirements for data that can be permanently accessed over the duration of the project
- access requirements to the data (in case of collaborative projects), how do they expect to access the data and for what purpose
- access requirements to the data (in case of collaborative projects): expected access ways and purpose
- transfer speed requirements
- sharing opportunities, guidelines for data sharing outside the institute, compliance and rights management
- "read-only" copy of the original raw data in a separate location (not editable)
- how long raw data, as well as data processing pipelines and analysis workflows need to be stored, especially after the end of the project
- "read-only" (not editable) copy of the original raw data in a separate location
- requirements on storage duration of raw data, as well as data processing pipelines and analysis workflows, especially after the end of the project
- [metadata](/docs/metadata/): identifier and file description, associated with your data
- requirements on version control to keep track of changes, conflict resolution, data mentoring and back-tracing capabilities

Involve the IT team of your home organisation, they can also provide advice on a tiered storage system:
Involve the IT team of your home organisation they can also provide advice on a tiered storage system:
- "hot" storage: fast access speed, high access frequency, high value data -> high cost
- "cold" storage: low access speed and frequency, usually off-premises -> low cost
- preservation solutions (data archiving services)
Expand All @@ -40,19 +40,19 @@ The 3-2-1-0 rule:
Why? Sometimes it's not a technical problem, but a "layer-8"-issue: human error.


### Ok, I'm lost - this is far from my business.
### Ok, I'm lost this is far from my business.

Many of the requirements are often solved by dedicated [repositories](/docs/repositories/). It is also worth taking a look at group drives or cloud services such as NextCloud (on-premises). Your local IT team and computing centre will help you with services that they usually support. But nevertheless: Make sure to generate good documentation (i.e., README file) and metadata together with the data. Check if your institute provides a (meta)data management system, such as iRODS, DataVerse, FAIRDOM-SEEK or OSF.


## Nirvana - your data in FAIR-paradise
## Nirvana your data in the FAIR-paradise
:::info Preservation
> Relevant (meta)data (to guarantee reproducibility) should be preserved for a certain amount of time, that is usually defined by funders or institution policy. However, where to preserve data that are not needed for active processing or analysis anymore is a common question in data management.

_see [RDMKit](https://rdmkit.elixir-europe.org/preserving)_
:::

Documentation or conversion of files into long-term backup formats. The data-holding facility must for its part guarantee security, quality and availability. Consider any licence regulations or data protection of personal data when releasing it to the public.
Data documentation is complete; files are converted into long-term backup formats. The data-holding facility must for its part guarantee security, quality and availability. Consider any licence regulations or data protection of personal data when releasing it to the public.

If you publish your data in public repositories, your data will also be preserved.

Expand All @@ -61,6 +61,6 @@ If you publish your data in public repositories, your data will also be preserve
- https://rdmkit.elixir-europe.org/storage.html
- https://www.rdm.kit.edu/index.php
- https://www.druva.com/glossary/what-is-data-archiving-definition-and-related-faqs/
- German: https://www.researchgate.net/publication/221657547_Handbuch_Forschungsdatenmanagement
- German: https://www.degruyter.com/document/doi/10.1515/9783110657807/html
- German: https://handbuch.tib.eu/w/Lehrbuch_Forschungsdatenmanagement/_Druckversion
- https://www.researchgate.net/publication/221657547_Handbuch_Forschungsdatenmanagement (in German)
- https://www.degruyter.com/document/doi/10.1515/9783110657807/html (in German)
- https://handbuch.tib.eu/w/Lehrbuch_Forschungsdatenmanagement/_Druckversion (in German)
Loading