Duke Physics MPI at RENCI Blueridge with Grayson and Pegasus 4.0

Overivew

Duke Physics will be running MPI jobs on RENCI Blueridge. The model is new and expected to grow. It’s been built with cluster specific libraries so it’ll be executed at Blueridge for the foreseeable future.

At the same time, we’d like the workflow to be easy to visualize, maintain, extend and debug. So we’ll use Pegasus 4.0 and the Grayson modelling and debugging tools to start the workflow with a single, simple component.

Workflow

For now, we’re keeping this very simple. A script is the only executable. There’s one input file and one output file, both of which are tar archives. The generic workflow module looks like this:

We’ll be launching this from RENCI’s Engage submit host, engage-submit3.renci.org, to the RENCI-Blueridge cluster. To do this, we’ll provide a context file that configures inputs, executables and outputs for this context. Here’s the configuration:

The first column imports blueridge.blueridge-basic from the standard library. This contains objects like MPI_8 which specify the correct globus RSL for an 8-way job on Blueridge.

It also imports cpic3d-flow. This is the model shown above.

In the second column, we make cpic3d.sh an MPI_8 job. This means that when it’s submitted, it will be with the appropriate RSL.

We also tag it as a local executable. This object’s description contains this JSON to tag its origin:

{
 "type"      : "abstract",
 "urlPrefix" : "gsiftp://${FQDN}/${appHome}/bin",
 "site"      : "${clusterId}"
}

The output DAX will contain an executable tag pointing at the physical file name (PFN) of the executable to use. That PFN will use this URL:

gsiftp://engage-submit3.renci.org//home/scox/dev/cpic3d/bin/cpic3d.sh

and specify the site as RENCI-Blueridge.

The local-input and local objects use very similar syntax to designate the locations of the input and output files.

Using Grayson and Pegasus should make it easier to run, debug, grow and change this workflow. This may include pointing it at different clusters as the needs of the research team change, making it hierarchical, and so on.

Usage

Source Code:

The easiest way to get the workflow is to check it out from SVN:

 svn co https://renci-ci.svn.sourceforge.net/svnroot/renci-ci/trunk/duke/cpic3d

Execute:

Setup the environment:

cd duke/cpic3d
source ./setup.sh

As with all OSG job submissions, make sure you have a valid grid certificate with:

voms-proxy-info

And if not, get one with

voms-proxy-init --voms Engage --valid 24:00

Then execute the workflow is as follows.

./submit blueridge --execute

This will generate the Pegasus artifacts for the workflow and submit it. It will also launch the Pegasus command line monitoring console to track the workflow.

Debug:

The pegasus-status console will show which jobs have completed and their status.

All workflow outputs are available in

daxen-blueridge/work/$USER/pegasus/cpic3d-flow/<run-directory>

Executable:

The Duke Physics community will want to edit bin/cpic3d.sh. Currently, it contains:

#!/bin/bash

set -x

app=/home/scox/dev/hp52/cpic3d

echo $app/bin/cpic3dmpi.mvapich2_gnu-1.6

tar cvzf output.tar.gz $app/run/*.dat

exit 0

So, changing the $app variable will be necessary to point to the most recent executable.

This file is staged to the cluster with each execution of the workflow so it offers a place to make things a bit more flexible.

Outputs

See work/outputs for the output file.

Monitoring and Debugging

Launching the job will monitor the workflow with a command like this:

pegasus-status -l /home/scox/dev/cpic3d/daxen-blueridge/work/scox/pegasus/cpic3d-flow/20120625T094503-0400

The directory specified is the unique run directory for this execution.

That directory can also be used with the pegasus-analyzer command as follows:

pegasus-analyzer -d <directory>

to provide a summary of the workflow.

For now, more detailed debugging information is available in the files in the run directory. As the project moves on, we’ll explore additional tools for debugging workflows.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s