Big Data is everywhere. The term collectively refers to breakthroughs in computer processing and analytics that allow researchers to efficiently analyze large quantities of complex data. The emerging field of Big Data is bringing transformative insights to many industries, including to agriculture. Building the digital platform that can put Big Data to use in agricultural research for development is the challenging task that keeps Medha Devare awake at night.
Dr. Devare is a Senior Research Fellow at the International Food Policy Research Institute (IFPRI). She received her PhD from Cornell in Crop and Soil Sciences and has worked as an agronomist for CGIAR, the global network of research centers for international agricultural development. Previously, Medha led a project in Nepal to improve productivity and profitability in farming systems, working closely with farmers to implement sustainable management practices.
Currently, Medha is a data architect for the CGIAR Platform for Big Data in Agriculture, an initiative to increase the use of Big Data tools across CGIAR’s 15 Research Centers and beyond. The Platform has three modules: Organize, Convene, and Inspire. Medha is the leader of the Platform’s Organize module that aims to mobilize the vast amounts of agricultural research data at CGIAR and elsewhere to produce new insights and increase the impact of agricultural research for development.
Food Tank had the opportunity to speak with Dr. Devare to learn more about her work.
Food Tank (FT): As the leader of the Organize Module for the Big Data Platform, how are you approaching the task of managing data from all of CGIAR’s 15 Research Centers?
Dr. Medha Devare (MD): CGIAR’s Open Access and Open Data (OA-OD) Policy was signed into effect in late 2013 and requires all CGIAR Research Centers and Programs to make their research data and findings publicly-accessible. To help operationalize this Policy, the Bill and Melinda Gates Foundation funded the Open Access and Open Data Initiative, which I led in 2015. The Organize module of the Platform for Big Data in Agriculture is an extension of the work begun through this initiative.
Increasingly, we’re not talking just about open data but rather about FAIR, or Findable, Accessible, Interoperable, and Reusable data. We realized that in order to fully reap the benefits of Big Data we needed to provide support to researchers and individual data centers and to clarify such questions as: How are we going to make data findable? What kinds of standards might we use to ensure data interoperability? Each element of FAIR has associated indicators that make the concept easy to implement and less open to interpretation.
Applying FAIR principles renders data more accessible for Big Data analytics to generate new insights. So we’re trying to support data management efforts in different ways at the 15 Centers with the broader goal of enabling FAIR data. In large part the emphasis of these efforts is on the “I”, or data interoperability, which tends to be hardest to achieve, and all of the work requires attention to both technical issues and the need for cultural change.
FT: Can you explain the idea behind GARDIAN, the new digital infrastructure for data at CGIAR?
MD: To accomplish the goal of making agricultural data findable, the Platform created GARDIAN, or the Global Agricultural Research Data Innovation & Acceleration Network. Think of it as a kind of Google search, which currently enables the discovery of information resources across the CGIAR Centers. Typically, each Center has at least two repositories: one for data, and another for publications, and they are on different platforms that generally don’t speak to each other. So we needed a way for people to search across Centers and repositories using single or multiple keywords—soil, water, drought-tolerant maize, you name it—to identify the resources that exist for that topic across CGIAR.
GARDIAN currently enables the discovery of about 100,000 publications and over 2,000 data sets, linking data and related publications to facilitate interpretation. While GARDIAN’s search currently operates across CGIAR Centers, it will soon allow users to find agricultural resources from other platforms as well. It allows the discovery of agricultural information very easily and quickly without having to know anything about the structure of the institutions that the data comes from, where the repository is, or what that URL is or to comb through the kinds of results that a general web search would yield.
FT: Once the data is available through GARDIAN, what is the next step?
MD: Beyond making agricultural resources discoverable, tagged and categorized, and downloadable via GARDIAN, one of our key goals is to enable seamless integration of the discovered data with models and analytical tools. We want to empower users not only to find data via GARDIAN but also to be able to easily visually explore it.
Another important feature we are aiming for is the ability for varied data sets to be combined and aggregated to enable the generation of new insights. This will most likely take us a few years to implement; the data needs to be streamlined across disciplines in a consistent, standardized format. Communicating with data is like using any other language, and how data is structured—its grammar or syntax—and its meaning must be consistent in order for researchers to combine data sets from different sources.
This is not so much about creating new analytics, but about getting data organized so that it is interpretable by humans and machines, and easy to plug into existing tools. Many of the models and decision support systems of relevance are already out there—we need to figure out how to use them in the best possible way.
FT: How has your background as a soil scientist, handling data on a microscopic level, influenced your work on the Platform and handling data across the vast organization of CGIAR?
MD: After I finished my PhD at Cornell, I started working on molecular techniques for studying soil microorganisms and their possible response to genetically modified crops. I was using molecular data and tools from the National Center for Biotechnology Information (NCBI), and it was absolutely mind-blowing to see what they had done. Starting in the 1980s, NCBI recognized that unless the biomedical, genomics, genetics, and allied sectors started to share data then transformative insight and innovation would likely be elusive, despite the funding being invested in those sectors.
NCBI’s efforts began with an attempts to bring together publications and data from across domains in the biomedical and related sciences—including micro-level genetics and genomics data—employing consistent standards and formats which enable researchers to find data from different sources and very easily plug it into a number of analytical tools. I realized we needed a similar approach to enable transformative research and development in the agricultural sector. I have been wanting to build something akin to the NCBI for agriculture for a long time.
FT: You have also worked in the field as an agronomist, providing guidance for farmers to improve their agricultural production systems. How has that experience informed your work on the Platform?
MD: One of the goals of the Platform is to provide actionable options for farmers. As a field agronomist, I managed a project working with farmers in some of the poorest districts in Nepal. Most of the time, the farmers had simple questions: “What should I do this rice season? Should I direct seed rice or should I set up a nursery to transplant? If I direct seed, what variety should I use and how can I manage the crop for the highest yield?”
Leading that project made me realize how much better we needed to do in terms of managing our data and ensuring that we were utilizing previously generated data and information fully. The first thing I tried to do was standardize how the team I managed was collecting data. Despite my efforts to create templates for data collection, I received poorly-described data that required many days to format in order to perform meaningful analyses across the sites. It struck me as a terrible waste of time. Why couldn’t we get the data in shape at the collection point rather than at the very end?
When I had the opportunity to lead the OA-OD initiative, I realized that this was my opportunity to get these standards in place and build NCBI-like functionalities for agriculture. I believe that to transform agricultural research for development, we need to share data that is well-annotated, well-organized, standardized, and interoperable so it can be analyzed either by a human or by a computer.
We should be able to build better location-specific decision support to farmers by utilizing FAIR data for weather, crop varieties, management regimes and markets. For example, the rice farmer wanting to know whether to direct seed or transplant rice might be helped by a decision support tool that uses short to mid-term weather predictions and a model based on management, variety, and yield data.
FT: What are the greatest opportunities for the Big Data Platform?
MD: The first opportunity is simply to increase the return on investments into agricultural development. The second opportunity is to use data in a way that will make a difference for decision making on many levels: for farmers, policymakers, extension agents, and others. To be able to address global challenges requires that we have well-described, well-annotated, interoperable, reusable, machine-readable data. It takes time and money, but it can save much more time and money in the long run. There is also great opportunity in democratizing access to both data access and data sharing. We want to give researchers the means to not only find and use others’ data, but to also share their own research outputs.