HeFDI Data Talk: RDM in Third Party Funding Applications - DFG Requirements - 26.01.2024

Wenn Sie heute einen DFG-Antrag schreiben, müssen Sie sich zum Forschungsdatenmanagement äußern. Die DFG erwartet nun von Ihnen, dass Sie den Umgang mit Forschungsdaten beschreiben und planen – zugeschnitten auf Ihr spezifisches Projekt. Auch im Rahmen der Begutachtung kann der Umgang mit Daten im Fokus stehen.

Doch was muss beachtet werden? Welche Unterstützung gibt es? Welche Fragen müssen zum Datenmanagement beantwortet werden – und wen kann ich um Unterstützung bitten?

In diesem Vortrag stellt Dr. Ortrun Brand, Ko-Koordinatorin der Landesinitiative HeFDI – Hessische Forschungsdateninfrastrukturen, die Eckpunkte der DFG-Vorgaben, notwendige Überlegungen und Praxisbeispiele sowie weitere Unterstützungsmöglichkeiten vor.

Die Präsentationsfolien dieses Vortrags sind auf Zenodo publiziert unter: https://zenodo.org/records/10572083

***
Die HeFDI Data Talks (https://t1p.de/hefdi-data-talks-2024) sind eine zweiwöchentlich stattfindende offene Informations- und Diskussionsveranstaltung rund um das Thema Datenmanagement im Kontext der Wissenschaft, bei der sich relevante NFDI-Konsortien sowie Forschungsdatenmanagement-Dienste vorstellen. In der Reihe werden aktuelle Themen diskutiert und zahlreiche Tools und Services, auch lokal und regional, vorgestellt. Die HeFDI Data Talks sind ein Angebot der Landesinitiative HeFDI, die vom Hessischen Ministerium für Wissenschaft und Forschung, Kunst und Kultur (HMWK) gefördert wird.

Wenn Sie Anregungen oder Rückmeldungen zu den Themen haben, wenden Sie sich bitte an die HeFDI-Geschäftsstelle (hefdi@uni-marburg.de). Wenn Sie regelmäßig über unsere Angebote und Veranstaltungen informiert werden möchten, können Sie sich gerne für unseren Newsletter anmelden (https://t1p.de/bgkfl)!

okay it’s 11:03 now and uh a warm welcome once again to everyone to all our participants and I’m very happy to see this large number of participants uh were interested in our today’s Hefty data talk my name is uton brand I’m part of the Hefty coordination team and I’m very happy to talk today about the DF gear requirements on research data management the German uh Research Foundation DFG you all submit lots of uh proposals to the DG or to other research foundations and they want you and it’s mandatory up to now to talk about uh research data management in your proposal and today I will talk about those DFG requirements I have a few remarks in the beginning and then I will talk about the framework the dfk guidelines and checklist and then what to look out for with respect to your DK proposal and we have some hints for bad and good practice and for crc’s and in the end there’s a small wrap up and afterwards we go into the discussion okay some remarks in the beginning as I said I will talk about the guidelines of the uh German Research Foundation but the DFG sets the pace for uh many other funding programs such as in has the L program or the the funding programs by the bmbf uh also Volkswagen sting Etc they all look more or less to the DFG guidelines concerning research data management so might also be applicable for other uh funding programs and it’s also very important we talk about General aspects with respect to research data management but always look at your discipline or subject specific guidelines if you have a look at the DFG Pages for research management they have subject or discipline specific guidelines from the DF so they set the pace for your respective disciplin so have a look at it please but of course we can provide you with a lot of advice also subject specific but as we work at the universities for all disciplines it’s more or less generic what we talk about and also the national research data infrastructure I will talk about it later on as well they have these 26 consortia which are discipline or subject specific or data specific you need to have a look at them as well for very special information on your discipline that’s what I wanted to remark in the beginning and then we can get started with the framework which sets the pace the DFG guidelines and checklists we have a DFG code of conduct which was renewed in 2019 and uh up to now it’s legally binding uh for all universities to be ready for funding by the DFG and in 2019 they renewed the guidelines for dealing with uh research data and there are 19 guidelines and 11 of which relate to the research process as such and research data management is relevant in eight of these 11 guidelines which which are relevant for the research process and we will have a very short look at them right now so these are guidelines mentioning research data management or addressing research data management in one or the other way and I will just give you a very brief look at three of them which to us are U most important which is guideline 12 on documentation guideline 13 on providing public access to research results and guideline 17 on archiving and guideline 12 says more or less what you should do is generally document your research process properly including what you’re doing on research data what kind of research data you’re obtaining or processing and also the software you use either you develop or you use for processing your research data and guideline 13 is most important um the one on um public access to your results guideline 13 recommends to publish the research data on which the publication your Publications are based publish it whenever possible of course there might be um data protection restrictions Etc so it’s not possible but whenever possible you’re supposed to publish your research data and guideline 13 also recommends to be compliant with the so-called Fair principles that is your data and your software should be findable accessible interoperable and reusable and this uh all this means especially the aspect of being interoperable and reusable that has to be machine readable I will elaborate on that a bit later on and the access ibility should be uh guaranteed via recognized archive and repositories it’s not like dropping a PDF on a website or something but you should give it in recogniz to the hands of recognized archives and repositories and also your software should be published if possible this means software you modet have uh developed yourself or for processing your data in guideline 17 which addresses archiving says that the research data on which published results are based that’s very important you’re not supposed to archive all your research data all raw data etc etc but those data uh have led to published results they are supposed to be stored in an accessible and traceable manner for a period of at least 10 years and this period starts right after the date of publication of the results but this is a very important aspect because when you do your research project of course um you’re Pro you’re obtaining your data you’re processing your data you have those fantastic results and at the point of publishing you have to uh make sure that your data is available for 10 years and just basically think of what you if you try to open and reuse data which you’ve used 10 years ago and try to find it so you all know what this means and that’s why you have to think about uh how to Archive your data way in advance and you’re supposed to Archive the data at the institution where they were generated for example the University or university of applied sciences you’re working at or the research institute whatever or in an accredited repository and I will talk about those kinds of repositories later on so these are the most important if the guidelines on this and I’ve uh mentioned already quite often the with data repository and how do I find this repository because as I said it is it’s not not at all necessary to drop like your data in sort of a PDF on some sort of website so the DFG recommends to contact a suitable research data repository as early as possible that means already when you’re writing your proposal you should um we we strongly recommend to already contact or look for a proper repository and already to contact it because some repositories they have fees to drop the data at the repository and of course you have to know about that to be able to ask for funding for those um fees and also um most repositories um ask for certain metadata which is a data to describe your data and of course you have to know about it in advance because when you start obtaining your data you should start describing your data with metadata right from the beginning on okay and uh we strongly recommend to you to find your repository by at first have a look at your nfdi Consortium we have 26 nfdi consortia in this uh country and they are very much discipline specific and they can lead you to the relevant repository for the discipline and if you don’t find a way through that contact your local research data management Service Center or health desk or um have a look at re3 data I see the question in the chat on examples of a data repository I will talk about that in a few seconds the question whether or not OSF is a repository OSF is a sharing platform it’s not a repository as such um I’d say but maybe some of my Hefty colleagues have a different opinion on that but uh to talk about the data repositories uh what’s a repository it’s a storage location for digital objects that make them available to a public or limited circle of users that’s what OSF already does okay and repositories can be distinguished according to the nature of stored digital object the discipline of the data and the storage uh period of data and here you see the screenshot of data UMR which is the data repository the Philips University maruk provides and that’s a so-called institutional data repository it takes up data from all researchers at the University and it does not differentiate between um disciplines yeah I see uh the the ongoing discussion on OSF whether or not it’s a repository it well open the the open science framework starts much earlier in the process of the research data life cycle so it allows you to share data right from the beginning on on project management platform yeah I would agree on that and if you do not have um sensitive data because with sensitive data it’s it’s a problem to store it in a in a project manager platform which is out outside the uh outside Germany or the outside the EU then it’s a good um tool to share your data your research data with others MH okay we’re going back to the repositories I already said there are three types of repositories generic one and zenodo is very much well known as a generic repository um to publish data there are subject specific repositories and Robert already dropped the link concerning re the data um there are recognized that’s a registry of recognized repositories and with subject specific repositories we strongly recommend to have a look at your nfdi Consortium for example nfdi for culture has set up a repository called radar for culture nfdi for chem has set up a repository called radar for chem Etc so there are subject specific repositories evolving and so it’s really good to have a uh to regularly have a look at your nfdi Consortium and of course we offer the institutional repositories our Hefty repositories uh based on d space at all sides in um in hessen um have a look at our website and you find your institutional repositories for the um hen universities and universities of applied sciences but of course we always recommend to use a discipline or subject specific repository because usually their data is much better described than in our institutional repositories of course our institutional repositories are set up they are present and running in place but a subject specific repository okay and the dfk offers a checklist um we see the screen shot on the right hand side and I’d like to remind you that up to now research data management is mandatory in all proposals and we strongly recommend to you to contact your local RDM service center for support with your proposal but to give you a few more hints I will elaborate a bit on the um aspects which are mentioned here in the checklist um what they actually mean so here we address your dfk proposal and what to look out for uh on the left hand side you see the research um data life cycle this picture is frequently used to um describe what to think about when you think at your research data and it starts with data creation data processing data analysis preservation access and reuse and it’s more less how the research uh life cycle goes you you write your funding proposal you start with your project and you get your data and you process your data analyze data and you publish your results Etc so it’s very close to the research process but looks at the data and yes start with data creation and what we offer to you and what we recommend to you is to start with a data management plan you can use our tool called research data management organizer rdmo the rdmo is it’s like a questionnaire on your data and you fill it in and the most important effect is to make you aware of what to think about when you look at your data which is like a like a very strong resource for you for your research and to be as good as possible a data management plan helps a lot to make you aware and to clarify basically everything around processing your data and U the DFG and other um funding other funders are very happy to see that you have data management plans and a question to my Hefty colleagues I’m I’m not sure at the moment how mandatory data management plans are for DFG projects I know it is recommended to have them but maybe someone can drop a note on how mandatory data management plans are for dfk projects I know for EU projects they are mandatory you have to have a data management plan but with dfk project I’m not 100% sure at the moment so maybe someone can drop a note on that but DFG also um with a BM with a bmbf it depends on the um on the the the funding um but DFG uh it’s always mandatory but not right at the beginning not with your proposal but later on you have to have a management plan right um yes but you um well in your proposal you have to say something about how you want to handle your research data just as thas note it down but it’s it really supports your your proposal when you say that you either want to set up a data management plan using for example the rdmo or already provide with a data management plan okay and we go on with this part of data creation at the beginning you should think of um describing your data so you should elaborate on how does your product generate new data or do you reuse existing data and what data types do you use and this means the formats you use for example image data text Data measurement data and how are they processed and to what extent is this data um generated that means what volume is to be expected and if it’s a really huge volume you need to say how how storage capacities are for this um data and then you also need to elaborate for this stage of the research data life cycle on documentation and data description that means what approaches are taking to describe the data and a comprehensible manner what kind of existing metadata and documentation standards do you use I will elaborate on that uh later on as well and what measures are taken to increase data quality um what kind of quality controls for example and what kind of digital methods and tools are used uh are required to use the data this is what you need to note down and data documentation always goes um alongside the questions who can understand my data who can open and reuse my data today or in six month or most important in 10 years because it’s possible you addressed in within 10 years and someone says okay how did you get to those results and you need to make proof of it and that’s what you should should think about how can someone open and understand my data in 10 years even yourself that’s the most important and I already said a few words concerning uh metadata which are very important to describe your data so this is a a picture of a cat the cat is the data and the metadata is what describes this species here you see the the file name which is the name of the cat and another aspect of metadata is the author where you found the cat what race of the cat etc etc and of course you can imagine a lot of further descriptions of this cat if it’s nice etc etc the color of the fur etc etc the age so that’s all metadata that describe the original data and metadata is a love not for the future because only with metadata we can understand research data it’s a structured description and it should be machine readable and with metadata your data is becomes findable in databases once they are published of course and the problem is that without metadata your research data won’t be understandable in a few years and everyone expects you to provide metadata because Bri in correct metadata are a strong provision to good scientific practice which the dfk expects of course and how do I do this well in the beginning I already with the the example of the cat I mentioned what what um aspects you should note down of course who collected the data uh where when why with which tools which parameters were used for example in an experiment what are the usage options of this data for others think of data licenses think of publishing it with a digital object I identifier in a repository and what contains the data for example the scientific object Etc and we mentioned quite a lot that you should use an recognized metadata standard to describe your data and of course you ask yourself what metadata standard should I use and here are a few hints what metadata standard suits for a certain discipline um for example for natural Sciences it’s the IIT schema cross disciplinary like in our repositories it’s the Dublin core or for example for the social sciences it’s the data documentation initiative and there are a lot of uh standards available and you have a link on this uh slide to our partner website forart and.info we from tribute to this uh website and there’s a long article on metadata and metadata standards which we strongly recommend to understand it and of course you can contact our help desk our local help desk and you should also contact your nfdi Consortium on that because uh a strong task for the nfdi consortia is to work towards Consolidated metadata standards for the respective disciplines they exist for some disciplines not for all of them but ask your nfdi Consortium regularly that’s very important important to to be up to date on that and we also get a lot of questions on well how do I do this um how do I annotate my data with uh metadata and here we I did provide you with a link to metadata annotation tools from the digital creation center in UK um they provide you with a lot of tools to use for metadata annotation and of course you want to know whether or not your data is described according to the fair principles and there are a few Fair data assessment tools coming up they are collected on the website of our colleagues from touring the and here you also have the link to the touring website for f Fair data assessment tools to help you with metadata and with how fair is your data and of course when writing your dfk proposal you also have to elaborate on the technical storage during the project phase and what we strongly recommend is not to mention Dropbox for storage because it’s not suitable for research data use your public infrastructure that’s provided for example by your University and we talked about the open science framework in the beginning as I mentioned if it’s not sensitive data then you can use the open science framework as an uh storage place for sharing and processing your data it’s this phase of the research life cycle but in any case in your proposal you have to elaborate on how is your data stored and where and we strongly recommend to contact your local infrastructure on that and of course if there are any aspects on information security or data protection for sensitive data elaborate on that as well especially on rights of access and within the aspect of technical data storage um you should look at the 321 backup rule Andre has provided me with this uh nice Graphics you need three copies of your data you have should store it on two different storage devices and one of it should be in a decentral place of storage and um that’s why we strongly recommend you to use your public infrastructure because usually public infrastructure is supposed to adhere to exactly this 3 to one um backup Rule and if you only say well I use two different hard disk drives and put them in my in my draw that won’t suit for the DFG the hbx is data exchange but not data storage and also sensitive data data is not can’t be shared on the hen box okay and also in your proposal you should address some legal aspects for example what are the legal conditions with regard to your research data sometimes there are specific rules for example with biodiversity data just think of the information CU that’s all the uh Nagoya protocol etc etc it’s good to mention them but also the general legal conditions concerning research data for example are there any restrictions concerning reuse and accessibility and also you should think of data and software licensing copyrights and rights of use and to help you a bit with this we have our slides concerning Hefty legal aspects on research data you find it in our zenodo community and also contact your local data management service desk or help desk and to help you with the legal aspects of data and of course uh at the end of the research data life cycle we look at accessibility and reuse of data and there you should drop a few words on which data is particularly suitable for reuse in other context or by Third parties once again you’re not supposed to just publish all and everything Etc what the DFG or the yeah revs want to see is that you took a bit of time thinking about criteria for example how to select data which is app for reuse for example it’s in any case good to show that you’re aware of whatever in this case for example aware of that you should select criteria for archiving your data so it’s really good to show your awareness and that you took time to think about it also at this point of your proposal uh you should note down whether or not you plan to Archive your data in a suitable infrastructure if so how and where are there embargo periods etc etc and of course um when you think of long-term archiving there are certain data formats which are suitable for long-term archiving we’ve made a hefty recommendation for those data formats just to mention a few of them this graphic is not full fully this table is not fully um translated but just to name a few for example with uh spreadsheets recommended formats are Comm separated files and usual Excel files are not suitable for that and for example with audio files a double UAV files or flag files are much more suitable than MP3 files for example have a look at our Hefty recommendation for suitable data formats and it’s also good for your dfk proposal to show you’re aware that certain data comments are suitable for long-term archiving and others are not if you note down I will store my Excel files in my local drive that’s not that good and also show your awareness for the national research data infrastructure which is being built up we have 26 consortia the nfdi is supposed to systematically deliver Up structures for sustainable secure and accessibility of research data it’s driven by a combination of scientific communities and infrastructures and it’s it’s expected to make a link to your nfdi Consortium in your DK proposal they are discipline specific they are data specific and they are really strongly evolving their services so it’s really good to have a look at their websites or their community events every once in a while and it’s also supposed to connect systematically to the European open science Cloud although we know this is still a very evolving sort of pipeline but that’s the point where you need to refer to the structure you need to refer to with respect to your research data management and your project and also what I strongly recommend to you is not to note down that you plan to do a singular development in your research project we develop a new software our our developer will develop a new software for address addressing this and that Innovation Innovative method or topic the da GE will look at that very very critical because experience has shown that singular developments or own developments if they not integrated into long-term infrastructures they diminish very quickly and that’s what the DFG does not want to so turn to your local service points to your local or Regional infrastructure to your nfdi Consortium or recognized data centers and repositories this all depends a bit on your discipline when you talk about your research data or your code management and use existing databases or repositories and tools to provide with a long-term perspective for your data and your code and also you talk about responsibility and resources you should note down who is responsible for the adequate handling of your research data for example description of the roles and responsibilities within the project and also what resources are required to implement this adequate handling of the research data and who’s responsible for curating or taking care of the data after the funding period and this is the most important Point what kind of money and how much money you can get for your uh research data management let me first drop a few words on what you cannot claim from the DFG that’s what we call the gr the DFG has the expectation that local data backup And archiving is guaranteed by your Institution for example by your University and all cost that serve these purposes like basic storage Etc are part of the basic equipment and cannot be approved you won’t get that money but you can get some money for example staff costs for processing the data user fees membership fees for repositories data centers whatever or other costs that show up when you use established infrastructures and you should always note down why you need these costs you need them to gain access to reip data for example or to process and prepare your research data or your software that you generate um so it can be used by others that’s what you need to make proof of and also the cost to transfer the data to a public repository and when you refer to existing infrastructures there’s a prerequisite which is that they have an accessible and transparent cost performance catalog that’s the prerequisite for the dfk okay so you have to make clear on that and then you can uh gain some money for your DFG funding so here I come close to the end and I will drop a few words on the what you should not do and what you should do in your proposal don’t mix up data management with your methods it’s something different they want to know how you make sure that your data which is obtained with Public Funding is available for the public for at least 10 years after your project has stopped what you should not do is make a backup to local hard disk or something and put them in your drawer don’t use for sharing options like Dropbox and if you ignore the possibility to have backup in the infrastructure of your University or institution it’s not good as well and you should explain with a few words what this uh backup is based on it’s also bad to have no reference to the nfdi or federal state initiative or existing RDM networks if you have sensitive data or personal data then you of course it’s necessary to drop a few words on data protection and here we go with what you should do of course describe properly data name metadata and schemas or ontologies control vocabularies head for a data management plan which is strongly recommended also for you as well as for your funer show your awareness that you do data management to be more transparent more replicable to offer more reuse of uh your data which is collected with public funds head for data publication if possible name the repository and contact it in advance show how you enable yourself or other people in your project with data literacy and training we offer a lot of training in Hefty by the way and reference to nfdi or state initiatives well and just a few words uh for those who are concerned with uh collaborative research centers in collaborative research centers it’s possible to have an own sub project which is called an INF project and very often you can be equipped with one data manager or data stewards and this info project works for other projects it’s like a z project and here again you have to make sure that the data management is to prepare data for subsequent subsequent use or for reuse and for internal interoperability it’s not just like a Bea it something it has to be that to make data available internally and externally and the data management has to be project specific and what’s needed is a close cooperation with local infrastructure for for example Computing Center University uh University libraries Etc what you cannot obtain as I already said is Hardware that’s basic equipment and it’s not like an IT specialist or method specialist for the CRC that’s what to think about for crc’s of course we can talk about that later on as well in the discussion and just a few tiny wrapup and some links for where you can get some support the dfk has by now a very nice website on RDM much better than the years before there you find the guidelines for your professional associations for discipline specific associations of course we strongly recommend our own website and our local service centers on RDM our data services which are provided by the user by the respective uh local universities of course we have some FAQs and they are very good discipline specific information on our partner website forsten doino well that’s about it for my talk and I’m very happy to take your questions

HeFDI Data Talk: RDM in Third Party Funding Applications – DFG Requirements – 26.01.2024