Fez: a Hatrack for your Metadata
Category
Published on
Abstract
Research data is being generated and modified at an increasingly accelerated rate. Iterations and derivations are being crafted at an almost equal velocity. With this increase comes a growing need to track the metadata about the data being generated. This data is currently typically housed in emails, excel documents, and even handwritten notes. These storage and dissemination mechanisms, though reasonably effective at maintaining the integrity of the data itself, lack the necessary metadata and provenance over the long term. As time progresses and the origin of the dataset is lost from active thought, the user of the dataset may need to recall one of several things. Where did this dataset originate? What exactly do the column headers mean? Who was the original publisher? Do I have the latest version of the data? This is to only name a few. If an original email can be found, one might be able to reconstruct several of these facts. But as data is shared second or third-hand or via alternative methods such as physical media or cloud based storage mechanisms, the veracity of the implicit metadata becomes circumstantial.
To this problem, we propose Fez, a distributed metadata repository for science Hubs. With Fez, researchers will be able to deposit trusted, abitrarily structured metadata on any kind of dataset. Users of these data will then be able to interrogate Fez to re-discover the conditions, parameters, and context of the datasets that are held in emails, attached to wiki pages, or resting in file folders from previous projects and collaborations. In this introductory talk, we will present use cases, a demonstration of the current state of Fez, and solicit a dialog with the Hub community on the future development of Fez.
Bio
Sam is a Software Engineer for HUBzero. He enjoys developing web components for the core HUBzero library with a specific focus on creating highly interactive user interfaces. He’s also working on ways to improve development and deployment issues within the HUBzero environment, including database migrations, unit testing, and more. Sam received his BS in Management with a focus in Management Information Systems from Purdue University and is working on his MS in Computer and Information Technology.
Cite this work
Researchers should cite this work as follows: