Data profiling and migration processes

Publikation: Andere BeiträgeSonstiger Beitrag

Abstract

In large research projects participants are required to migrate data to the official information infrastructure and adopt that for subsequent research activities. This package contains migration plan and corresponding data which were created to migrate data and users of large Collaborative Research Centres (CRC) e.g. TRR277 AMC (a CRC funded by German Research Council).

In this data package there are two code packages, an example of data profile, data profile templates and a plan of data and users’ migration. One of the code packages is an example of framework for data profiling to migrate data. The current implementation of this package is based on the example of WebDAV interface offered by PowerFolder. The other code package creates directory structures together with corresponding metadata files in bulk quantity for data to be placed in Data Science Storage (DSS). It creates all the entities based on the specified naming convention for projects, work packages, storages and folders as well as corresponding metadata templates. For details about templates and structures, please refer to the data package “Simplified DataCite compliant metadata templates and directory structures to manage research data” available at https://doi.org/10.14459/2024mp1735150. The naming convention in this case maintains the contextual information and facilitates the integration of data in TUM Workbench. The profile templates are in tabular forms which are also suitable for spreadsheet format and Web/ digital form. Example of data profile, generated by the data profiling application, is in JSON format. It has been truncated to remove redundant information and edited to replace personal information e.g. values are replaced with ###

The contents of this data package are based on the AMC specific policy, information infrastructure, project distribution and organisation, nature of data etc. For details about policy, please refer to the data package “Research data Management policy for large CRC projects” available at https://doi.org/10.14459/2024mp1734393. The defined strategy and procedures may be adapted by the target groups as per their own policy, information infrastructure, the distribution of tasks and data organisation schemes etc.
OriginalspracheEnglisch
PublikationsstatusVeröffentlicht - 2024

Fingerprint

Untersuchen Sie die Forschungsthemen von „Data profiling and migration processes“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren