Metadata are data about data.
Metadata are generated and used when documenting your research and the resulting data. They contain information that establishes the findability, accessibility, interoperability and re-usability of data following the FAIR data principles.
For a detailed definition of the term metadata and other relevant terms please refer to our Glossary.
Your research data are complex. Without documentation they are inaccessible to colleagues hindering collaboration.
Metadata make your data accessible. Adding metadata, such as an experimental protocol, enables colleagues to use your data and collaboration becomes easy. Metadata make your data citable. Provenance, such as authorship, is established through metadata ensuring you are credited for your work.
Taken together, metadata increase the impact of your data.
One of the biggest hurdles for recording metadata – their availability – is connected to when metadata are recorded.
The effort to generate metadata retrospectively and attach it to a set of data is often very high. Therefore, the recording of metadata should always happen alongside or close to the generation of the research data itself.
In doing so, information is readily available and can be easily, automatically or semi-automatically recorded and saved alongside the research data. The knowledge of the researcher regarding generation, processing and analysis of the data can be documented in a structured manner and will consequently not be lost.
The question what metadata are appropriate for a given dataset, depends on both the scientific context and the dataset itself. The following aspects are essential to consider:
Vocabularies: Where possible metadata used for the description of a dataset should be aligned with an acknowledged and controlled vocabulary. This ensures that the captured descriptions remain generally understandable and interoperable with other datasets.
PID: Assigning a persistent identifiers (PIDs) such as a DOI to a research data set makes it findable and citable. This data set then becomes citable which increases the reputation as well as the visibility of the author(s). PIDs can have applications for the machine readability of metadata.
Repositories: In order to publish data a suitable repository needs to be identified. Requirements and demands towards quality control of the gathered data and/or long-term preservation in a given repository might differ depending on scientific domain. Re3Data or the databases of NIH and NHS are good starting points for a search.
Licensing: In order to clarify access and usage rights to tailor them for e.g. potential re-use of your data, an Open Content license can be assigned. For example, the licenses developed by Open Data Commons for open data and open databases can be used for research data.
Data privacy protection: Before publishing, you should check whether the data contain trade secrets or privacy rules and laws apply to your data. Further, funding decisions, employment or service contracts, might contain provisions that prevent the data from being published or allow publication only under certain conditions. Contact the data protection/privacy officer responsible for your research institute or search for applicable Data Policies. An overview of institutional policies can be found here.
The specific implementation of a metadata best-practices might differ depending on scientific context and application background and vary on scale from a single lab project to an institutional level.
HMC supports various use cases in order for you to be able to find an example that might be used as a scaffold for your own project. Further information can be found on the information pages of the Metadata Hubs.
All the information that is required to understand and interpret your research data properly, is what should be gathered as metadata.
In order to identify what metadata are explicitly required in your project, it can be helpful to set a data-management plan (DMP). For this, assistance from various sources can be utilized.
A minimal set of metadata should serve to answer questions like the following:
Who gathered the data?
When and where was the data gathered?
Why/for what purpose was the data recorded?
Which type of data was recorded?
How was the data gathered?
How was the data stored (file format/structure)?
How was the data pre-processed & analyzed (raw vs. analyzed data, pre-processing such as filtering, selection, or else)?
How was data quality assessed and guaranteed?
How can the data be accessed?
The HMC Office and the Metadata Hubs can be approached with further questions and for support. We want to help you to arrive at the optimal set of metadata for your project.
If you have questions regarding metadata you can always approach the HMC. We work together with and for researchers and data managers of the Helmholtz Association. Our work is integrated with national and international initiatives and we also work with interested people from outside the HGF.
The “Nationale Forschungsdateninfrastruktur” (NFDI) consists of domain-specific consortia that work within their research domain and German community.
The “European Open Science Cloud” (EOSC) is an initiative of the European Commission that builds an Open Science infrastructure for Europe.
HMC works closely alongside both the NFDI and EOSC, and other initiatives in the field to make sure that our work contributes to the global research community.
Re3data can be helpful in order to identify domain specific repositories that host datasets.
The non-profit site DataCite can be used to search DOI referenced datasets.
For domain specific questions, please do not hesitate to contact your Metadata Hub.