HeFDI Data Talk: HeFDI-Repositories - a service offer - 02.06.2023

A research data repository is a platform that allows researchers to deposit, store, and share their data with others. We will discuss the importance of sharing research data openly and the benefits of using a research data repository.

Dr. Andreas Geißner, TU Darmstadt and Dr. Lydia Riedl, Philipps-Universität Marburg will introduce the HeFDI repository landscape with a special focus on the institutional repository of the University of Marburg (data_UMR: https://data.uni-marburg.de/).

thanks again for the kind introduction I will start with giving a general idea or overview about the kind of services repositories we are talking about today and then I switch will switch Focus to those uh repositories in the context and Hefty of Hefty and afterwards will pick data UMR as one of those repositories to and give you some more details on such a service what kind of uh service are we uh talking about here imagine yourself as a researcher that produces a lot of research data um you might want to store This research data together with your metadata in some form of central database this is basically a research data repositories where you can upload your data put in additional meta data using forms and then the data will be archived in that central place for at least some minimum of period of time um you can share the data with selected other individuals or you can even make it available using the service to anybody who is interested in it meaning you can publish your data using data repositories I mean why would you want to do that one I thing is of obviously information and knowledge transfer um if you store the data in a central place um people will find it again think maybe about a future you who might want to access the same data again in a few of years think about different researchers in the future of your own research quow group but think also think about collaborators peer reviewers who might want to look at the data the whole scientific world and even beyond the other thing is um to confirm with um rules and regulations for example the do fortunes mind shaft guidelines on good scientific practice there are several guidelines that specifically mention research data uh management um I will uh specifically point to guideline 13 and guideline 17 today guideline 13 is about publishing research results and it says that when you publish results you should also make your research data and all materials information and what you on which your results are based publicly available this also includes software and Source codes with is also you know research data guideline 17 speaks about archiving research data and it even specifically mentions repositories it it says that you should um archive your research data in an accessible and identifiable manner for a period of at least 10 years at the institution where the data were produced or in Cross location repositories what so what do um repositories do so that they are good at a good place for data archiving well let’s have first a look at the life cycle uh where the uh research datas uh where the repositories are located in the life cycle um most of you who have ever attended a research data management talk before have probably seen some version or other of the slide it’s a model about the different stages research data goes through when um during a typical research project it starts with planning collecting processing and analyzing and you as you have guessed from the earlier slides repositories come mainly into play in this later stages of the research data life cycles mainly when it comes to preservation of the research data and also when it comes to provision of access to other people um to allow these um features of repositories um you can do uh um specific things with with those you can obviously upload data and input metadata uh usually um using forms that have been tailored to the metadata model of the respective repositories you can assign persistent identifier to the data for example a digital object identifier but you can also often link persistent identifiers to the authors of the data such as the open research and uh contributor identification these repositories allow data sharing with specific individuals such as in your team or with outside individuals on request they allow you to assign licenses though that everyone who uh downloads the data knows exactly what he or she is allowed to do with with them and they allow for download of public data sets and often also to create new versions of existing data sets on the other side they allow have functions to search for data within the repository and often they are also indexed in outside data Discovery systems so that you have a central um search entry point research data repositories are usually um subdivided into classing using or by their target a Audience by asking questions like who can submit data sets to the repositories or also researchers of which Fields might be interested in the contents like that you can dis dis dis distinguish for example between discipline or sub specific repositories generic repositories and institutional repositories subject specific repositories are often the first choice of repositories if you want to um store and publish your data as they are available for anyone in the field without the regard of data origin and um as they are such a central um resource in the respective field they also give a high visib visibility to them uploaded data sets they typically have uh fitting domain spe specific metadata schemers they are run by discipline expert but they might be be limit in some regards with their service for example they might only allow publishing of data set and not um archiving your data without having to publish it uh if you go to a discipline specific repository you should um um make sure that it’s uh located at a institution that can make sure that it’s run permanently and still such a repository might not be available in your field yet to check for that um the database registry of research data repositories is a very good entry point v data.org lists more than 3,000 repositories and using it search filters and search functions you can uh for example identify discipline specific repositories but you can also filter for other things like data types metadata standards the licenses you can sign if there are excess restrictions and so on on the other end of the spectrums there are generic repositories which you could also call catch all repositories they are not tailored to any specific discipline nor are they restricted to a specific data origin they typically have generic metadata schemas some of them are hosted by large public institutions with most famous probably being noo which is hosted in turn and is a not for-profit repository and finally we come to the institutional repositories they are kind of a subset of the subject agnostic repositories as they are run by research institution with a limited scope and of archiving and Publishing data from that institution so their a subject Focus uh is depending on the institution that runs the repository there are several institutional repositories are exist and come from the Hefty context and I will introduce those in a moment but first um have a look while you want might want to um go to institutional repositories with your data those are suitable for for data for which no discipline specific repository exists yet more and more discipline specific repositories are being um set up but uh they’re not there for all disciplines as of now scientists might want to prefer that to stare their data at home at an institution um or they want their institution to be under control of the data um uh they upload they often allow a very differentiation assignment of Rights also in working together with the identity management of the respective institution and also they allow long-term storage of the data for the archiving purposes without having to publish it if as some legal or um practical reason um oppose publication as I that there are uh several institutional repositories in the Hefty context in for maruk uh University of um Castle researchers there is Dux for data for maror there’s data UMR which ludia will talk about in a moment H for G gon there is jlu data since a few days we have a new addition to the family which is guda which is uh from the University of frur it basically has started its public test phase just a few days ago but um researchers from the universities are already very welcome to upload data there that will also be kept long term and finally we have the repository from my own institution to data left from the Technical University of Dad uh one speciality from the Hefty context about two data lip is that its service has been expanded to different um universities of applied sciences in hen H and so that those researchers can also upload an archive data there without the institutions having to um set up their own institutional repositories um so researchers from the universities of applied science in dad Frankfurt fora geisenheim and also of the uas Ry mine are very welcomed and encouraged to use D to to datal lip as well and with that I will hand over to ludia who want will introduce data UMR as one example of those repositories they all that all have a very very similar and Technical background and are therefore very similar great uh thanks a lot for this great introduction and overview um I will refer back to the good scientific practice um that Andrea’s introduced and I will go along some essential aspects to introduce data UMR so so let’s switch to this picture like this is the starting page of data UMR and um this is currently uh running with dpace 6 this is a software for for repositories and it is hosted by the University of MarBorg um we will we are going to migrate the uh repository um in some month so so it will change a little bit but um the the features uh will stay and will be expanded so okay uh one special service of data UMR is that there’s repository is curated which means that um you as a researcher of University of MarBorg um have the opportunity to upload Lo your data but it’s not um um it’s not uh publicated um um IM immediately but before uh publication we will go through it like in a mini review process so to speak um to check whether your data is fair um and give some hints to how to like make your data um fair and this is this Fair principle I talked about like Fair data I guess most of you already heard about Fair data fair means that data should be findable accessible interoperable and re reusable and this is in in the end what Andrea has already talked about it’s good scientific practice to make um your data fair and I try to introduce the features of data a using this Fair principle and um we once created a comic I’m sorry it’s in German but it’s it’s like the the MarBorg uh uh data management comic so I use it here um yeah like data UMR supports you to make your data Fair as I said and um the first aspect of fair data is findability so how does data UMR supports findability well first um our repository is um um supporting uh P IDs like persistent identifiers so you can um register a DOI for your data and uh make it uh persist persistently findable uh through that um we do have a rich rich metadata options to support you describing your data in a very detailed way and we use a metadata scheme that is human readable and also machine readable so you as a human can um go through the data sets that we store and understand what is there and what’s the context of the data but also you could uh search um uh them uh using programs for examples so this is an example of a data set and you can see we um have like um uh a part of the metadata here and also the DOI of this data set though this is how a usual data set looks like so the next next aspect of fair data is the accessibility um how does data UMR uh supports accessibility well um data should be made as much available as possible um this is one aspect and um they should be preserved um and data UMR supports you doing that because it’s an open repository so you have the option to publish your data openly um but you don’t have to publish all um like all bitstreams of your data set uh because sometimes it’s uh difficult to publish all because of some yeah some rights you might not have on your data um so we support differentiated rights assignments um we support an embargo so you could also decide to not publish your uh data immediately but after some some time for some reasons um and we support you through Consulting and support as I said the the uh repository is curated and so you’ll have some support from people who are uh working with with data yeah and uh data UMR um guarantees preservation of your data for minimum 10 years and uh we also support you in our consultation um with the file types that are um that are suitable for archiving so we support you with um yeah with uh preparing your data set to make it um yeah longterm um available so and here um is a screenshot of a data set where um I wanted to show you that we have this differentiated rights assignments so you have here rme f for example that’s free freely available and another um file that’s not available for free so yeah and this is just a um form for um assigning the Embargo that I talked about the third aspect of fair data is interoperability um and um interoperable data means that your data should be available in such a way that it could be exchanged interpreted or compared with other data and [Music] um we support you um doing this um through um the possibility of referencing um data record um for sample with a relation types here um you see that uh we have a relation is supplement to so this data set um has a related uh publication in form of an article and it is supplemented by a GitHub code so this data set could be used um yeah you could read about the context and you can use the code to um work with the data and the last aspect of fair data is the reusability and reusable data are well documented um it is transparent how the data is um how they were collect or generated um what variable names are for example um if the data is raw or processed um like all the information about your data that people could find helpful um is part of a good documentation um if something changes in your data set also a versioning could be help helpful so um the versioning means that it is transparent how your data set changed and why um and which parts of the data set changed and um reusability also means that you give rights to the people um to use your data but you have the opportunity to choose what the people are allowed to do with your data and this is um possible through licensing versioning is possible in data UMR and a brid documentation is possible so as you can see here a lot of context is given and during the curation process we support you with the documentation because mostly we are not from your subject from from a specific field and um if we understand context of your data it’s really really well described so it’s also possible to uh give uh different licenses to choose what people can do or are allowed to do with your data and that’s mainly it thank you very much

HeFDI Data Talk: HeFDI-Repositories – a service offer – 02.06.2023