Cluster Aware Amber PMEMD – Beta

There’s a new component that bundles Amber 9 PMEMD for execution in OSG’s emerging HTPC model.

It’s a package of binaries and scripts which will eventually hide the details of HTPC job submission.

Install

These steps only need to be done once. Download and unpack the distribution. It contains the statically compiled binaries and scripts to execute them on the OSG.

wget --no-check-certificate https://ci-dev.renci.org/nexus/content/repositories/renci-release/org/renci/ci/cpmemd/cpmemd/1.0/cpmemd-1.0-bin.tar.gz
tar xvzf cpmemd-1.0-bin.tar.gz

Initialize

Perform these steps for each shell. Change to the installation directory, set RCI_HOME and source the RCI environment script.

cd <install>/cpmemd-1.0
export RCI_HOME=<install>/cpmemd-1.0/app/bin/rci
source $RCI_HOME/bin/environment.sh

Submit A Job

This will create a new run directory and an associated Condor submit file.

If there is not a valid grid proxy, the user will be prompted for a username and password to create one.

The Globus RSL for the job will be read from app/resources/globusrsl.txt.

The cluster targeting information will be read from app/resources/grid-glue.txt.

job_submit

Following a Job’s Progress

The log of the overall directed acyclic graph (DAG) can be followed with:

job_dag

The job’s standard error and standard output can be viewed, presumably in another shell, using:

job_init
tail $RUN_DIR/logs/*/*

The Condor/OSGMM status of the job can be viewed using:

condor_grid_overview | grep $USER

The output of the completed job shows detailed execution steps including

  • Input parameters for the job
  • Script output
  • Exit status

Here’s an example:

==> /home/scox/dev/cpmemd/cpmemd-1.0/runs/20110103_1251/logs/1/job.out <==
--(inf):    --processing argument: [--exec=job.sh]
--(inf):    --processing argument: [--runid=20110103_1251]
--(inf):    --processing argument: [--jobid=1]
--(inf):    --processing argument: [--appurl=gsiftp://engage-submit2.renci.org/home/scox/dev/cpmemd/cpmemd-1.0]
--(inf):    --processing argument: [--infile=one]
--(inf): Created workdir in /scratch/condor/execute/dir_7511
--(inf): ===========================================================================================================
--(inf): executable: job.sh
--(inf): run id    : 20110103_1251
--(inf): job id    : 1
--(inf): base url  : gsiftp://engage-submit2.renci.org/home/scox/dev/cpmemd/cpmemd-1.0
--(inf): input file: one
--(inf): work dir  : /scratch/condor/execute/dir_7511/job.QYFpRL7553
--(inf): start dir : /scratch/condor/execute/dir_7511
--(inf): ===========================================================================================================
--(inf): stage data into work dir: /scratch/condor/execute/dir_7511/job.QYFpRL7553
--(inf): getting gsiftp://engage-submit2.renci.org/home/scox/dev/cpmemd/cpmemd-1.0/app/
--(inf): sourcing	/scratch/condor/execute/dir_7511/job.QYFpRL7553/app/bin/job.sh
--(inf): --[set] CPMEMD_HOME
--(inf): --[run] cpmemd-osg.sh...
--[remove] rci-0.1alpha.tar.gz, rci...
--[get] rci-0.1alpha.tar.gz...        
--[unpack] rci-0.1alpha.tar.gz...     
--[set] RCI_HOME...       
--[initialize] rci env... 

--(env): CPMEMD_HOME 	: /scratch/condor/execute/dir_7511/job.QYFpRL7553/app
--(env): CPMEMD_INPUT_DIR 	: /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/in
--(env): CPMEMD_LOG 	: /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/test.out
--(env): CPMEMD_OUT_DIR 	: /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/out
--(env): CPMEMD_RUN_DIR 	: /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run
--(env): OSG_HOSTNAME 	: pf-grid.unl.edu
--(inf): getting http://152.54.9.153:8081/nexus/content/repositories/renci-build-server/org/renci/mpich-static/1.2.7p1/mpich-static-1.2.7p1.tar.gz
--(inf): getting http://152.54.9.153:8081/nexus/content/repositories/renci-build-server/org/renci/mpich2-static/1.1.1p1/mpich2-static-1.1.1p1.tar.gz
--(inf): getting http://152.54.9.153:8081/nexus/content/repositories/renci-build-server/org/renci/amber-pmemd-static/9/amber-pmemd-static-9.tar.gz
--(inf): extracting mpich-static-1.2.7p1.tar.gz ...
--(inf): extracting mpich2-static-1.1.1p1.tar.gz ...
--(inf): extracting amber-pmemd-static-9.tar.gz ...
--(inf): =======================================================================================
--(inf): == CPMEMD
--(inf): =======================================================================================
--(inf):      start: /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/bin/cpmemd-osg.sh-@-b02-@-Mon Jan  3 11:57:16 CST 2011
--(inf): =======================================================================================
--(inf): == input: [i14vnphdhf+_hip]
--(inf): =======================================================================================
--(inf): Minimization &cntrl imin=1, maxcyc=200, ntpr=5, cut=12, &end
--(inf): =======================================================================================
               /usr/bin/time /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/mpich2/mpiexec.gforker
                 -np 8 /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/pmemd/pmemd.mpich2
                 -O
                 -i run/i14vnphdhf+_hip.in
                 -c in/i14vnphdhf+_hip.crd
                 -p in/i14vnphdhf+_hip.top
                 -x run/out/i14vnphdhf+_hip.tra
                 -r run/out/i14vnphdhf+_hip.rst
                 -e run/out/i14vnphdhf+_hip.ene
                 -o run/out/i14vnphdhf+_hip.out
134.57user 2.48system 0:18.88elapsed 725%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+219862minor)pagefaults 0swaps
--(inf): --(pmemd-exec): execution time: 0:00:19
--(inf): =======================================================================================
--(inf): == input: [i14vnphdhf+_hipW1]
--(inf): =======================================================================================
--(inf): calentando las aguas de 100K a 150K a volumen constante, periodic boundary condition &cntrl imin=0, nstlim=10000, dt=0.002, ntx=1, irest=0, ntpr=1000, ntwr=1000, ntwx=1000, ntwe=1000, tempi=100.0, temp0=150.0, ntt=1, tautp=2.0,iwrap=1, ntb=1, NTC=2, NTF=2, cut=12, &end
--(inf): =======================================================================================
               /usr/bin/time /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/mpich2/mpiexec.gforker
                 -np 8 /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/pmemd/pmemd.mpich2
                 -O
                 -i run/i14vnphdhf+_hipW1.in
                 -c in/i14vnphdhf+_hipW1.crd
                 -p in/i14vnphdhf+_hipW1.top
                 -x run/out/i14vnphdhf+_hipW1.tra
                 -r run/out/i14vnphdhf+_hipW1.rst
                 -e run/out/i14vnphdhf+_hipW1.ene
                 -o run/out/i14vnphdhf+_hipW1.out
7070.18user 63.14system 15:17.93elapsed 777%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9520883minor)pagefaults 0swaps
--(inf): --(pmemd-exec): execution time: 0:15:18
--(inf): =======================================================================================
--(inf): --(end): /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/bin/cpmemd-osg.sh-@-b02-@-Mon Jan  3 12:12:53 CST 2011
--(inf): total test execution time: 0:15:37
--(inf): =======================================================================================
--(inf): --[copy] /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run to /scratch/condor/execute/dir_7511/job.QYFpRL7553/out
--(inf): staging out /scratch/condor/execute/dir_7511/job.QYFpRL7553/app.stdouterr
--(inf): executing job cleanup...
=== RUN SUCCESSFUL ===

Advanced

Targeting: To change the targeting of the job to a different cluster:

  • Change app/resources/grid-glue.txt to target the subcluster unique id of the desired cluster.
  • Change app/resources/globusrsl.txt to the correct RSL for the cluster. See the table on this page for details.

Testing: It’s sometimes useful to run the program locally to test changes to the scripts before running it on the OSG. It’s necessary to set up a few environment variables for this to work. Here’s an example that should run the program locally producing output similar to the run above:

export WORK_DIR=~/dev/cpmemd-1.0
export STDOUT=$WORK_DIR/output.txt
export RCI_HOME=$WORK_DIR/app/bin/rci
source $WORK_DIR/app/bin/rci/bin/environment.sh
source $WORK_DIR/app/bin/job.sh
chmod +x $WORK_DIR/app/bin/cpmemd-osg.sh
job_run_model

Next Steps

Work is under way to map the correct RSL to each HTPC cluster automatically using OSGMM. Until this is complete, the approach above will need to be used to manually map RSL to clusters.

The CPMEMD build will be changed to target the new RENCI CI system.

The build will be made more modular to support user defined scripts.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s