Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines

By Ewa Deelman

Information Sciences Institute

Category

Seminars

Published on

Abstract

This talk will focus on the ability to construct and execute computational pipelines/workflows within the HUBzero environment using the Pegasus Workflow Management System. Pegasus is available today in NEES.org, DiaGrid.org, and other hubs. Pegasus allows users to develop workflows at a high-level of abstraction, without worrying about the details of the execution environment. The workflow includes information about the workflow steps, the input and output data they take-in and produce. Each hub is pre-configured for a particular execution environment, so that users can seamlessly launch their workflows on the available resources. Pegasus provides monitoring interfaces to follow the progress of the workflow. When failures occur, it tries to recover from them. However, if recovery is not possible, Pegasus provides detailed failure information. The standalone version of Pegasus has been used in a variety of domains: astronomy, bioinformatics, earth science, physics, and others. Pegasus within the hub opens up its capabilities to a broader range of users.

Bio

Ewa Deelman is a Research Associate Professor at the USC Computer Science Department and a Project Leader at the USC Information Sciences Institute. Dr. Deelman's research interests include the design and exploration of collaborative, distributed scientific environments, with particular emphasis on workflow management as well as the management of large amounts of data and metadata. At ISI, Dr. Deelman is leading the Pegasus project, which designs and implements workflow mapping techniques for large-scale applications running in distributed environments. Pegasus is being used today in a number of scientific disciplines, enabling researches to formulate complex computations in a declarative way. Over the years, Dr. Deelman worked with a number of application domains including astronomy, bioinformatics, earthquake science, gravitational-wave physics, and others. As part of these collaborations, new advances in computer science and in the domain sciences were made. For example, the data intensive workflows in LIGO (gravitational-wave physics) motivated new workflow analysis algorithms that minimize workflow data footprint during execution. On the other hand, improvements in the scalability of workflows enabled SCEC scientists (earthquake science) to develop new physics-based seismic hazard maps of Southern California. In 2007, Dr. Deelman edited a book on workflow research: "Workflows in e-Science: Scientific Workflows for Grids", published by Springer 2007. She is also the founder of the annual Workshop on Workflows in Support of Large-Scale Science, which is held in conjunction with the Super Computing conference. In 1997 Dr. Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute. Her thesis topic was in the area of parallel discrete event simulation, where she applied parallel programming techniques to the simulation of the spread of Lyme disease in nature.

Cite this work

Researchers should cite this work as follows:

  • Ewa Deelman (2012), "Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines," https://help.hubzero.org/resources/779.

    BibTex | EndNote