Good practices for FAIR data management - an interview with Robert Inklaar (Director GGDC) on achieving impact with open data
Date: | 21 September 2023 |
Author: | Ana Alves & Alba Soares Capellas |
Part of open science is that researchers make their data FAIR: Findable, Accessible, Interoperable and Reusable. But how to do this? In this series, we ask researchers to tell us more about their data management choices.
One way to increase the impact of scientific research is by sharing research data. The datasets of the Groningen Growth and Development Centre (GGDC) are openly available and some have even been downloaded over 100,000 times.
We asked Director Prof Robert Inklaar to tell us more about the importance of FAIR data, the datasets of the GGDC and why he chose to be an Open Science ambassador for the Faculty of Economics and Business.
Could you tell us more about the Groningen Growth and Development Centre (GGDC) and what your objectives are?
The GGDC brings together researchers interested in economic development and addresses how we can use a data-driven approach to improve our understanding. The Centre was founded by the eminent economic historian Angus Maddison, who is famous for quantifying economic growth and comparative income levels over the very long run. Since then, the scope of work at the Centre has broadened to also include research on more current patterns in productivity and welfare, structural transformation and international economic integration. A common feature of the work is that we aim to develop an improved understanding and analysis through better and more extensive economic measurement and data.
Your World Input-Output Database (WIOD) datasets have been downloaded over 100,000 times. What is the content of this database and for whom is it relevant? What do you think is the reason behind the impressive amount of downloads of your datasets?
The World Input-Output Database allows researchers to trace the flow of goods and services within and across countries around the world. This is a key ingredient for those interested in the changing nature of globalization and the role of global value chains, where producers across many industries and countries contribute to a final product, such as a car or cell phone. This data is being used extensively in academic research projects, for policy analysis and in teaching, which leads to a big audience for these datasets. That makes WIOD one great example of GGDC research. By developing and providing new or enhanced data, a host of new questions can be addressed, laying the foundation for broad use.
Compiling these databases seems to involve a lot of international collaboration. How does this work in practice? Can you describe some of the challenges related to obtaining the data?
International research projects, including EU Horizon projects, have often laid the foundation for this type of data development, first with the EU KLEMS database on industry productivity and then with WIOD. This involves extensive cross-border collaboration, deadlines, deliverables and of course much coordination on what is collected and how. In many cases, we rely on official datasets, released by statistical agencies. We then develop and implement new types of measures. As academics, we can often move further and do experimental work to push the boundaries of an area, but that does require pioneering effort and, in some cases, resistance from statisticians who would not push ahead so far and fast.
The type of collaboration also varies with the dataset. For example, the Maddison Project Database brings together work by many (academic) economic historians, who develop new series for a specific country. The added value is then in combining their data pieces and providing a framework for international comparison. And in the case of the Penn World Table, only a small number of active researchers, primarily in Groningen, develop new versions of the database. Overall, our strength is not so much in developing new primary data on some phenomenon but rather in creatively using secondary sources to provide a more comprehensive, more interesting perspective and basis for analysis.
Databases and datasets from the GGDC are made accessible through the UG's data repository DataverseNL. Why was this your chosen platform?
For a long time, we used the regular UG website to host our datafiles but this has serious drawbacks, most importantly vulnerability to changes in the website that break URLs and limitations on file size. In addition, there are no easy ways of tracking downloads and usage. DataverseNL covers all these bases, providing a persistent DOI-link, providing access to large data files and letting us track downloads at the level of datasets or even individual files. And you can even link to individual data files from another website. So, our GGDC website is still the main portal for presenting datasets to the world, but for access to the data, we link to DataverseNL.
GGDC datasets are published openly with a Creative Commons (CC) licence. As an Open Science ambassador of the Faculty of Economics and Business, could you tell us your perspective on why you find this important?
Compiling new datasets is often the work of years and traditionally, many researchers wanted to keep their work close at hand, only sharing analysis and writing new papers based on those data for as long as they could. For a long time, our founder, Angus Maddison, resisted publishing spreadsheets of his long-term income series, stating that authors should just copy by hand the numbers from the tables in his book.
But limiting the spread of new tools for researchers is ultimately counterproductive. Yes, we like to develop and publish all new insights from a new dataset that are to be had—and get credit for them. But is that realistic? Can all the uses be foreseen and insights be developed by a single (group of) researchers? Probably not.
Making datasets widely available is, in my view, a crucial part of contributing to the profession. The CC license is a very helpful part of this, because it gives providers and users a clear deal, explaining how the data can be used and how they should be credited. The CC license means we do not need to explain the terms of use and users are assured that the terms cannot be changed on a whim. Taken together, this makes sharing data much more effective. We have chosen the least restrictive license, so users can modify what we provide and even build commercial applications, all to stimulate wide use.
Do you think that openly sharing datasets is rewarded enough in academia? In other words: Do you think that sharing datasets should be recognized in a similar way as peer-reviewed publications for career-progression purposes?
I think that with the spreading impact of DORA, we are moving towards a system where the form and outlet of research output matter less than the quality of the research. NWO already makes it possible to put forward a wide range of research output types in their grant applications, including datasets. If you can argue that a piece of research output has been important and of high quality according to relevant indicators, then that should count, even if it is not an article in a reputable journal that has received a certain number of citations.
At the same time, research is largely about convincing people that you are pushing the frontiers of knowledge. No matter how brilliant your new dataset is, you still need to convince your colleagues of its relevance, quality and importance and the typical vehicle is still a peer-reviewed journal publication. So, at a minimum I see the development of new datasets and the publication of interesting articles to go hand-in-hand for us.
Also, the challenges of the DORA development for research assessment should not be understated. We are still typically in a system where we count the number of journal publications, usually weighed by some importance factor. Under such a system, it is fairly simple to objectively rank people or determine when they meet a career-progressing milestone. Moving to a multidimensional system of many types of research output with a multitude of quality indicators has the benefit of recognising the true diversity of what can be ‘good research’. But there is the danger that any ranking will be less objective and marking career milestones less transparent and subject to debate.
So, yes, I am in favor of recognizing and rewarding research output in the form of (e.g.) datasets. But career-progression milestones are very important for people in academia and I think we need to start such discussions by building a shared understanding of what constitutes ‘good research’ and continue to strive for milestones that can be objectively and transparently stated. And I do think this is a discussion that should be conducted with some urgency because the world of research evaluation is shifting and we want to point junior scholars to career milestones that will remain relevant throughout their careers.
The world today is facing a multitude of societal and economic problems, such as global inequality, socio-economic differences, food shortages, environmental degradation, and climate change. What are in your opinion the possible contributions of the GGDC to better understand and even alleviate these problems?
“Change the world through better measurement!” No, that might make a fun slogan, but I think our mission as academics should be about understanding and informing rather than the activism of ‘alleviating’. And even within that mission, it is good that we are part of a broader system. There are people that develop some of the conceptual frameworks that we use in our measurements, there are those that take our data and provide new stylized patterns and facts.
To take one concrete example of where the GGDC fits in this system, consider the Penn World Table. In this dataset, we draw official statistics on average income levels (GDP per capita) for most countries around the world and stitch together time series going back 50, 60, 70 years. We then combine these with data developed (in recent years) in the Word Bank on comparative price levels, through the International Comparison Program. This provides information about comparative income levels for most of the post-war period and has been widely used in research but also in magazines such as the Economist (e.g., on Singapore’s impressive growth performance) and newspapers like the New York Times (e.g., on the scope for development in emerging economies). Combine that with large numbers of students that have been trained in using this type of data and have it readily available in their own jobs and you can imagine that analysis and policy is informed by GGDC data. Looking ahead, we aim to continue pushing the boundaries of economic measurement and analysis in the area of growth and development to inform students, scholars and policymakers.
About the author
Ana Alves is a Data Steward at the UG Digital Competence Centre (UG DCC)
Alba Soares Capellas is Communications Officer at the UG Digital Competence Centre (UG DCC)