Good practices for FAIR data management (3) - an interview with Konstantin M. Wacker and Juan Armando Torres Munguía on ‘A global dataset of pandemic- and epidemic-prone disease outbreaks’
Date: | 27 January 2023 |
Author: | Leon ter Schure |
Part of open science is that researchers make their data FAIR: Findable, Accessible, Interoperable and Reusable. But how to do this? In this series, we ask researchers to tell us more about their data management choices.
In this edition we highlight the data publication ‘A global dataset of pandemic- and epidemic-prone disease outbreaks,’ that was published in Springer Nature’s Scientific Data.
We asked co-author Konstantin M. Wacker [KW] (Assistant Professor at the Department of Global Economics & Management of the UG’s Faculty of Economics and Business) and corresponding author Juan Armando Torres Munguía [JTM] (Research Assistant at the Faculty of Economic Sciences, Georg-August-Universität Göttingen) a few questions.
Can you explain what this data is about?
JTM: It is a dataset of disease outbreaks around the world from 1996 onwards. Since the Covid pandemic, there has been a lot of interest in pandemic diseases and hence a need for high-quality data. The innovativeness of our project is that the information is automatically sourced from a news dashboard of the World Health Organization (WHO). When the WHO publishes news on a new Ebola case in Congo or on avian flu in Singapore, our code will automatically feed that into the dataset.
Why did you choose to publish this dataset together with the full replication code open access?
KW: We clearly wanted the dataset to be available to other researchers. At the Groningen Growth and Development Centre we have a tradition of creating publicly available datasets like the Penn World Tables (PWT) or the World Input Output Tables (WIOD). The Covid pandemic is a great example of the social challenges we face – and good data are key to better understand those challenges and to develop responses. It does not make sense to sit on a rich data set in the middle of a pandemic when the data could be useful to others and hence help us in dealing with the pandemic.
JTM: We also made the code available because it allows others to update the dataset. Several researchers or institutions create datasets but do not have the capacity to update them, so they may be outdated already once they are publicly available. With the code openly accessible, researchers can scrape the latest data from the web and also tailor the code to their needs if they are interested in particular aspects, such as a specific region or disease.
What did you do to make this data FAIR (Findable, Accessible, Interoperable, Reusable). What support did you receive or would have liked to get?
KW: This was facilitated by the journal Scientific Data. We purposely picked this journal because it is one of the few outlines that publishes data descriptors and it has a large impact factor. The journal is open access and designed to support the FAIR principles, so the journal submission requirements helped and enforced those principles. During the submission and review process, we got feedback where to improve the data description and even on our code. For us, this was enough support for this project but it also gave me some ideas on how to better implement FAIR principles in my future projects. In that sense “open science” is also an open process.
This data publication resulted from a collaborative project that emerged from ENLIGHT. What is ENLIGHT and how did you work together on this publication?
KW: ENLIGHT is an alliance of nine European universities to foster collaboration in research and teaching. The initiative for our ENLIGHT collaboration came from Inmaculada Martínez-Zarzoso, my former third PhD supervisor in Göttingen.
JTM: Inmaculada was my PhD supervisor and also established the connection with Bordeaux, another ENLIGHT partner. The ENLIGHT funds allowed me to work on the data and coding aspects of the project during the final stage of my PhD. We had online meetings at the start and towards the end of the project where we discussed conceptual questions and the publication strategy. Framing the paper and thoroughly describing the data was then a joint effort towards the end, where all of us took sequential turns.
Did you receive any feedback from fellow researchers?
JTM: Yes, colleagues from Göttingen and from other universities have told me that they are interested in using this data set for exploring correlates of disease outbreaks, and from a research centre in the United States I was told that they want to use our dataset to provide information to support decision making for projects on pandemic prevention.
KW: A colleague of mine further uses the data in his course. And I met an employee from a UK health agency who thought this was a very useful contribution.
Useful links:
Learn more about the UG’s Open Science Programme
ENLIGHT is a European University formed by nine comprehensive, research-intensive universities
The UG’s Digital Competence Centre (UG DCC) supports researchers in managing their research data, throughout the entire research (data) life cycle, from grant proposal to FAIR data archiving.
About the author
Leon ter Schure is Lead of the UG Digital Competence Centre (DCC).