AgMIP Data Aggregator: Leveraging Globus Online and HUBzero to Make Global Spatial Data Accessible
Category
Published on
Abstract
As part of the Agricultural Model Intercomparison and Improvement Project (AgMIP, http://www.agmip.org), the Global Gridded Crop Model Intercomparison group ran a suite of global climate models and crop models to estimate historical and future changes in yields for crops under different climate change and irrigation scenarios. While the output of this comprehensive modeling endeavor is freely available, significant challenges remain if one wishes to utilize this data. First, this archive consists of more than 36,000 global grids with spatial resolution of 0.5x0.5 and organized in complex layers of folders, making it difficult to navigate and locate the data of interest. Second, this archive is accessible only through Globus Online. While Globus Online provides reliable data transfer, a user needs to set up a local Globus endpoint, if one does not already exist, in order to download data from the archive. Finally, after acquiring data, a user may need programming skills to process data in its native NetCDF format.
Working with the AgMIP archive creators, the GeoShare project developed the AgMIP Aggregator tool to make the AgMIP data archive broadly accessible. Built on HUBzero tool framework, AgMIP Aggregator integrates Globus Online inside the tool, making data access, processing and visualization available via its graphical user interface. Any GeoShare hub user can indicate their interest by selecting the model, crop, and other variables, download and aggregate data to any user-defined level, and visualize results as thematic maps. In this presentation we will describe our effort in creating the AgMIP tool and enabling Globus Online in HUBzero. Our experience in dealing with the data challenges on the GeoShare hub will be beneficial to researchers and tool developers alike as they tackle the rapid growth of scientific data collections.
Bio
Lan Zhao is a research scientist in Rosen Center for Advanced Computing (RCAC) at Purdue University. She has been working on the design and development of data driven cyberinfrastructure systems for multiple cross-disciplinary projects, including GEOSHARE (Geospatial Open Source Hosting of Agriculture, Resource and Environmental Data), WaterHUB, U2U (Useful to Usable), GABBS (Geospatial Modeling and Data Analysis Building Blocks in HUBzero), XSEDE CESM modeling gateway, DRINET (Drought Research Initiative Network), and IsoMAP (Isoscapes Modeling, Analysis and Prediction). Her interests include infrastructure for scientific data storage, retrieval, provenance, and processing, integration and sharing of heterogeneous data sets and models, and composition of data-driven scientific workflows.
Cite this work
Researchers should cite this work as follows: