There’s a new component that bundles Amber 9 PMEMD for execution in OSG’s emerging HTPC model.
It’s a package of binaries and scripts which will eventually hide the details of HTPC job submission.
Install
These steps only need to be done once. Download and unpack the distribution. It contains the statically compiled binaries and scripts to execute them on the OSG.
wget --no-check-certificate https://ci-dev.renci.org/nexus/content/repositories/renci-release/org/renci/ci/cpmemd/cpmemd/1.0/cpmemd-1.0-bin.tar.gz tar xvzf cpmemd-1.0-bin.tar.gz
Initialize
Perform these steps for each shell. Change to the installation directory, set RCI_HOME and source the RCI environment script.
cd <install>/cpmemd-1.0 export RCI_HOME=<install>/cpmemd-1.0/app/bin/rci source $RCI_HOME/bin/environment.sh
Submit A Job
This will create a new run directory and an associated Condor submit file.
If there is not a valid grid proxy, the user will be prompted for a username and password to create one.
The Globus RSL for the job will be read from app/resources/globusrsl.txt.
The cluster targeting information will be read from app/resources/grid-glue.txt.
job_submit
Following a Job’s Progress
The log of the overall directed acyclic graph (DAG) can be followed with:
job_dag
The job’s standard error and standard output can be viewed, presumably in another shell, using:
job_init tail $RUN_DIR/logs/*/*
The Condor/OSGMM status of the job can be viewed using:
condor_grid_overview | grep $USER
The output of the completed job shows detailed execution steps including
- Input parameters for the job
- Script output
- Exit status
Here’s an example:
==> /home/scox/dev/cpmemd/cpmemd-1.0/runs/20110103_1251/logs/1/job.out <== --(inf): --processing argument: [--exec=job.sh] --(inf): --processing argument: [--runid=20110103_1251] --(inf): --processing argument: [--jobid=1] --(inf): --processing argument: [--appurl=gsiftp://engage-submit2.renci.org/home/scox/dev/cpmemd/cpmemd-1.0] --(inf): --processing argument: [--infile=one] --(inf): Created workdir in /scratch/condor/execute/dir_7511 --(inf): =========================================================================================================== --(inf): executable: job.sh --(inf): run id : 20110103_1251 --(inf): job id : 1 --(inf): base url : gsiftp://engage-submit2.renci.org/home/scox/dev/cpmemd/cpmemd-1.0 --(inf): input file: one --(inf): work dir : /scratch/condor/execute/dir_7511/job.QYFpRL7553 --(inf): start dir : /scratch/condor/execute/dir_7511 --(inf): =========================================================================================================== --(inf): stage data into work dir: /scratch/condor/execute/dir_7511/job.QYFpRL7553 --(inf): getting gsiftp://engage-submit2.renci.org/home/scox/dev/cpmemd/cpmemd-1.0/app/ --(inf): sourcing /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/bin/job.sh --(inf): --[set] CPMEMD_HOME --(inf): --[run] cpmemd-osg.sh... --[remove] rci-0.1alpha.tar.gz, rci... --[get] rci-0.1alpha.tar.gz... --[unpack] rci-0.1alpha.tar.gz... --[set] RCI_HOME... --[initialize] rci env... --(env): CPMEMD_HOME : /scratch/condor/execute/dir_7511/job.QYFpRL7553/app --(env): CPMEMD_INPUT_DIR : /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/in --(env): CPMEMD_LOG : /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/test.out --(env): CPMEMD_OUT_DIR : /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/out --(env): CPMEMD_RUN_DIR : /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run --(env): OSG_HOSTNAME : pf-grid.unl.edu --(inf): getting http://152.54.9.153:8081/nexus/content/repositories/renci-build-server/org/renci/mpich-static/1.2.7p1/mpich-static-1.2.7p1.tar.gz --(inf): getting http://152.54.9.153:8081/nexus/content/repositories/renci-build-server/org/renci/mpich2-static/1.1.1p1/mpich2-static-1.1.1p1.tar.gz --(inf): getting http://152.54.9.153:8081/nexus/content/repositories/renci-build-server/org/renci/amber-pmemd-static/9/amber-pmemd-static-9.tar.gz --(inf): extracting mpich-static-1.2.7p1.tar.gz ... --(inf): extracting mpich2-static-1.1.1p1.tar.gz ... --(inf): extracting amber-pmemd-static-9.tar.gz ... --(inf): ======================================================================================= --(inf): == CPMEMD --(inf): ======================================================================================= --(inf): start: /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/bin/cpmemd-osg.sh-@-b02-@-Mon Jan 3 11:57:16 CST 2011 --(inf): ======================================================================================= --(inf): == input: [i14vnphdhf+_hip] --(inf): ======================================================================================= --(inf): Minimization &cntrl imin=1, maxcyc=200, ntpr=5, cut=12, &end --(inf): ======================================================================================= /usr/bin/time /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/mpich2/mpiexec.gforker -np 8 /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/pmemd/pmemd.mpich2 -O -i run/i14vnphdhf+_hip.in -c in/i14vnphdhf+_hip.crd -p in/i14vnphdhf+_hip.top -x run/out/i14vnphdhf+_hip.tra -r run/out/i14vnphdhf+_hip.rst -e run/out/i14vnphdhf+_hip.ene -o run/out/i14vnphdhf+_hip.out 134.57user 2.48system 0:18.88elapsed 725%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+219862minor)pagefaults 0swaps --(inf): --(pmemd-exec): execution time: 0:00:19 --(inf): ======================================================================================= --(inf): == input: [i14vnphdhf+_hipW1] --(inf): ======================================================================================= --(inf): calentando las aguas de 100K a 150K a volumen constante, periodic boundary condition &cntrl imin=0, nstlim=10000, dt=0.002, ntx=1, irest=0, ntpr=1000, ntwr=1000, ntwx=1000, ntwe=1000, tempi=100.0, temp0=150.0, ntt=1, tautp=2.0,iwrap=1, ntb=1, NTC=2, NTF=2, cut=12, &end --(inf): ======================================================================================= /usr/bin/time /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/mpich2/mpiexec.gforker -np 8 /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run/sbin/pmemd/pmemd.mpich2 -O -i run/i14vnphdhf+_hipW1.in -c in/i14vnphdhf+_hipW1.crd -p in/i14vnphdhf+_hipW1.top -x run/out/i14vnphdhf+_hipW1.tra -r run/out/i14vnphdhf+_hipW1.rst -e run/out/i14vnphdhf+_hipW1.ene -o run/out/i14vnphdhf+_hipW1.out 7070.18user 63.14system 15:17.93elapsed 777%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+9520883minor)pagefaults 0swaps --(inf): --(pmemd-exec): execution time: 0:15:18 --(inf): ======================================================================================= --(inf): --(end): /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/bin/cpmemd-osg.sh-@-b02-@-Mon Jan 3 12:12:53 CST 2011 --(inf): total test execution time: 0:15:37 --(inf): ======================================================================================= --(inf): --[copy] /scratch/condor/execute/dir_7511/job.QYFpRL7553/app/run to /scratch/condor/execute/dir_7511/job.QYFpRL7553/out --(inf): staging out /scratch/condor/execute/dir_7511/job.QYFpRL7553/app.stdouterr --(inf): executing job cleanup... === RUN SUCCESSFUL ===
Advanced
Targeting: To change the targeting of the job to a different cluster:
- Change app/resources/grid-glue.txt to target the subcluster unique id of the desired cluster.
- Change app/resources/globusrsl.txt to the correct RSL for the cluster. See the table on this page for details.
Testing: It’s sometimes useful to run the program locally to test changes to the scripts before running it on the OSG. It’s necessary to set up a few environment variables for this to work. Here’s an example that should run the program locally producing output similar to the run above:
export WORK_DIR=~/dev/cpmemd-1.0 export STDOUT=$WORK_DIR/output.txt export RCI_HOME=$WORK_DIR/app/bin/rci source $WORK_DIR/app/bin/rci/bin/environment.sh source $WORK_DIR/app/bin/job.sh chmod +x $WORK_DIR/app/bin/cpmemd-osg.sh job_run_model
Next Steps
Work is under way to map the correct RSL to each HTPC cluster automatically using OSGMM. Until this is complete, the approach above will need to be used to manually map RSL to clusters.
The CPMEMD build will be changed to target the new RENCI CI system.
The build will be made more modular to support user defined scripts.