Overivew
Duke Physics will be running MPI jobs on RENCI Blueridge. The model is new and expected to grow. It’s been built with cluster specific libraries so it’ll be executed at Blueridge for the foreseeable future.
At the same time, we’d like the workflow to be easy to visualize, maintain, extend and debug. So we’ll use Pegasus 4.0 and the Grayson modelling and debugging tools to start the workflow with a single, simple component.
Workflow
For now, we’re keeping this very simple. A script is the only executable. There’s one input file and one output file, both of which are tar archives. The generic workflow module looks like this:
We’ll be launching this from RENCI’s Engage submit host, engage-submit3.renci.org, to the RENCI-Blueridge cluster. To do this, we’ll provide a context file that configures inputs, executables and outputs for this context. Here’s the configuration:
The first column imports blueridge.blueridge-basic from the standard library. This contains objects like MPI_8 which specify the correct globus RSL for an 8-way job on Blueridge.
It also imports cpic3d-flow. This is the model shown above.
In the second column, we make cpic3d.sh an MPI_8 job. This means that when it’s submitted, it will be with the appropriate RSL.
We also tag it as a local executable. This object’s description contains this JSON to tag its origin:
{
"type" : "abstract",
"urlPrefix" : "gsiftp://${FQDN}/${appHome}/bin",
"site" : "${clusterId}"
}
The output DAX will contain an executable tag pointing at the physical file name (PFN) of the executable to use. That PFN will use this URL:
gsiftp://engage-submit3.renci.org//home/scox/dev/cpic3d/bin/cpic3d.sh
and specify the site as RENCI-Blueridge.
The local-input and local objects use very similar syntax to designate the locations of the input and output files.
Using Grayson and Pegasus should make it easier to run, debug, grow and change this workflow. This may include pointing it at different clusters as the needs of the research team change, making it hierarchical, and so on.
Usage
Source Code:
The easiest way to get the workflow is to check it out from SVN:
svn co https://renci-ci.svn.sourceforge.net/svnroot/renci-ci/trunk/duke/cpic3d
Execute:
Setup the environment:
cd duke/cpic3d source ./setup.sh
As with all OSG job submissions, make sure you have a valid grid certificate with:
voms-proxy-info
And if not, get one with
voms-proxy-init --voms Engage --valid 24:00
Then execute the workflow is as follows.
./submit blueridge --execute
This will generate the Pegasus artifacts for the workflow and submit it. It will also launch the Pegasus command line monitoring console to track the workflow.
Debug:
The pegasus-status console will show which jobs have completed and their status.
All workflow outputs are available in
daxen-blueridge/work/$USER/pegasus/cpic3d-flow/<run-directory>
Executable:
The Duke Physics community will want to edit bin/cpic3d.sh. Currently, it contains:
#!/bin/bash
set -x
app=/home/scox/dev/hp52/cpic3d
echo $app/bin/cpic3dmpi.mvapich2_gnu-1.6
tar cvzf output.tar.gz $app/run/*.dat
exit 0
So, changing the $app variable will be necessary to point to the most recent executable.
This file is staged to the cluster with each execution of the workflow so it offers a place to make things a bit more flexible.
Outputs
See work/outputs for the output file.
Monitoring and Debugging
Launching the job will monitor the workflow with a command like this:
pegasus-status -l /home/scox/dev/cpic3d/daxen-blueridge/work/scox/pegasus/cpic3d-flow/20120625T094503-0400
The directory specified is the unique run directory for this execution.
That directory can also be used with the pegasus-analyzer command as follows:
pegasus-analyzer -d <directory>
to provide a summary of the workflow.
For now, more detailed debugging information is available in the files in the run directory. As the project moves on, we’ll explore additional tools for debugging workflows.

