Complex Workloads on HUBzero – Pegasus Workflow Management System

By Karan Vahi (presenter)1, Mats Rynge1, Steven Clark2, Ewa Deelman

1. USC Information Sciences Institute 2. Purdue University

View Link (HTM)

Licensed according to this deed.



Published on


The HUBzero platform for scientific collaboration enables tool developers to build tools that are easily shared with both researchers and educators. This enables users to login and start their analysis without worrying about setup and configuration of the tools. Once, the analysis is done researchers can then analyse the results using various inbuilt capabilities for plotting and visualization. To facilitate handling of more complex workloads, we have integrated Pegasus Workflow Management System with “submit”, the main tool used by tool developers in HUBzero to submit analysis to local and remote compute resources. Pegasus WMS provides a means for representing the application workflow in an abstract form which is independent of the resources available to run it and the location of data and executables. It compiles these abstract workflows into an executable form that can be executed on local or remote distributed resources. Pegasus also captures all the provenance of the workflow lifecycle from the planning stage, through execution, to the final output data. This enables users to easily debug and monitor their computations that occur on remote resources. The advanced data management capabilities of Pegasus allow the tool developers to execute the tightly coupled parts of their workloads on a HPC cluster, while farming out remaining tasks to a distributed HTCondor based computing infrastructure.

The talk will give an introduction to scientific workflows with Pegasus and focus on integration of Pegasus WMS with “submit”, and how it enables tool developers using the Rappture toolkit or “submit” directly, use scientific workflows. Currently the tools benefitting from this integration include BLASTer an online tool to run BLAST on the DiaGrid Hub, CryoEM a tool to reconstruct 3-D structure of macromolecular assemblies, OpenSees workflows through NeesHub and parameter sweep workflows to study ballistic transport in Field Effect Transistors using OSG resources through nanoHUB.



Karan Vahi

Karan Vahi is a Computer Scientist in the Science Automation Technologies Group at USC Information Sciences Institute.

Karan has been associated with the Pegasus Project since its inception, first as a Graduate Research Assistant and then as a full time programmer. He is currently in charge of development for Pegasus WMS and works closely with the user community to drive its development. In addition, he is also involved in two NIH funded projects, PAGE and CGSMD where computational workflows are being developed for Quality Control Analysis and imputation analysis. Before that he was the technical lead for the STAMPEDE Project that developed high performance monitoring infrastructure for workflow systems. The project has since been integrated in Pegasus and Triana workflow systems. From 2006 to 2008, he was also the lead developer on a AFRL/IARPA funded project for developing a framework for running automated and on demand intelligence analysis on multiple and varied data sources as part of a Terrorism Surveillance System.

Karan received a M.S in Computer Science from University of Southern California and a B.E in Computer Engineering from Thapar University, India. His research interests include scientific workflows and distributed computing systems.

Sponsored by

HUBzero Foundation

Cite this work

Researchers should cite this work as follows:

  • Karan Vahi; Mats Rynge; Steven Clark; Ewa Deelman (2016), "Complex Workloads on HUBzero – Pegasus Workflow Management System,"

    BibTex | EndNote


Claire Stirm

HUBzero - HUB Liaison