Protected: Grayson – Science Workflow on the Hybrid Grid

This content is password protected. To view it please enter your password below:

Posted in Compute Grids, condor, Engage VO, GPGPU, grid, High Throughput Computing (HTC), High Throughput Parallel Computing (HTPC), multicore, OSG, RENCI, Uncategorized | Tagged

Whole Genome Sequencing and Pegasus

There’s a group at RENCI working on Next generation Genome Sequencing technologies (NGS) and Whole Genome Sequencing (WGS) in particular. I’ve been helping them to get a new workflow executing on our Blueridge cluster.

As a first step, they joined the OSG Engage VO. This provides them access to the new Engage Submit node, its GlideinWMS submission engine and to the Pegasus Workflow Management System it hosts.

The whole genome sequencing workflow is a multi-step process involving about a half dozen different executables. It runs on Blueridge and Kure – local clusters -but there’s strong interest in making the workfow portable for execution on other systems including the Open Science Grid.

The new users were provisioned on the submit node and DOE certificates obtained according to the usual Engage process.

Then I worked with the team to create a Pegasus DAX representing the workflow. Our main challenges involved

  • Debugging: Getting used to searching the Pegasus logs for output status. Pegasus creates a directory for each workflow submission. Each job in the workflow will produce a series of separate files, one of which is an XML file containing details of the execution including fully qualified paths to executables and files. It also contains standard output and error and exit status codes of each job. These are indispensable for debugging.
  • Files: Some of the files used in this workflow were pre-staged on the local cluster to allow us to get the workflow up and running. The DAX file elements needed to specify site=’local’ attributes to tell Pegasus not to try to stage the files.
  • Environment: We initially had the user executing the workflow mapped to the generic engage user. Because we were adapting the workflow to use a legacy data setup, this ran into trouble with Unix file permissions. I reconfigured the user’s DN to map to their cluster local user so we could proceed.
  • Pegasus and Scripts: One of the most difficult items to figure out was caused by something very simple. One of the scripts in the workflow had appropriate ownership and execute privileges configured but failed each execution. It turns out that it did not have the #!/bin/bash at the top of the script. So look out for that if you’ve exhausted other debugging avenues.

Next steps are to prepare the workflow – or components of it – for execution on OSG. This will involve

  • Collecting the required input files into archives that are entirely portable – i.e. free of symbolic links and user privilege issues.
  • An assessment of the size of the data to determine the best way to provision it to compute nodes.
  • Assessment of the executables to see if any would benefit from a high throughput parallel (HTPC) treament.
  • Selection of OSG resources appropriate for the task.
  • Altering the Pegasus site configuration and process generally to work with GlideinWMS and target OSG.

More on this soon.

Posted in Uncategorized | Leave a comment

HTPC Enablement

Here’s an outline that may be useful for new user communities with parallel applications. It introduces the Open Science Grid, the concepts behind High Throughput Parallel Computing and discusses the process of adapting an application to that environment. It also introduces GlideinWMS as an important part of solving the resource selection problem.

Would this approach be helpful for the communities you interact with?

Feedback welcome.

Posted in Uncategorized | Leave a comment

DHFR @ OSG

Our first researcher using Amber PMEMD on the OSG reports molecular dynamics are four to eight times faster on the OSG than with the infrastructure she had access to previously.

That’s for the all CPU version, i.e. without the Nvidia GPGPU support in Amber11.

Here’s a machine’s rendering of the section of chromosome 5 she’s studying: Dihydrofolate Reductase (DHFR):



Posted in Amber11, Amber9, Engage VO, GPGPU, High Throughput Computing (HTC), High Throughput Parallel Computing (HTPC), multicore, OSG, pmemd, RENCI, Uncategorized | Leave a comment

Amber11 – PMEMD for NVIDIA GPGPU

Molecular Dynamics

Proteins are important and their structure complex.  And then they move.

The way they move determines how organisms work … or fail. The shape of the protein determines its function so motion means its shape is in flux and its function dynamic. Seeing these complex molecules move means solving molecular dynamics (MD) equations for thousands of atoms.

And that’s what Amber11 PMEMD does. It solves MD equations to depict the motion of molecules taking advantage of high performance clusters, MPI, and multi-core architectures. Using these techniques, it’s able to give researchers insight into the behavior of proteins and other molecules such as Rho GTPases which initiate complex signaling pathways governing critical cellular functions.

Understanding how proteins and other molecules move is critical to understanding how organisms develop and the causes of diseases. But the computational complexity of even the smallest simulations forces researchers to

  • Eliminate sections of molecules (the domain) from simulations entirely
  • Use coarse grained abstractions of sections of molecules
  • Simulate brief time periods where longer simulations might be more desireable
  • Build programs around time slices instead of thinking about abstract workflows

Faster processing of in-silico MD simulations can show us more molecular motion sooner.

NVIDIA GPGPU

Blueridge and NCSA Lincoln both have NVIDIA Tesla S1070 General Purpose Graphics Processing Units (GPGPU). GPUs were originally for rendering graphics. But they’ve been re-purposed to address general computing problems.

CUDA

The CUDA environment provides compilers, linkers, debuggers for developing GPGPU applications. The version of Amber11 PMEMD for CUDA dramatically outperforms CPU based computation methods:

Parallel Amber11 PMEMD for CUDA

I followed the instructions on the Amber site for building parallel Amber11 PMEMD for CUDA . In my environment that boiled down to the following script.

   export AMBERHOME=$APP_BASE/amber11
 export CUDA_HOME=/opt/cuda
 export MPI_HOME=/opt/mpi/intel/mvapich2-1.4
 export MPIPATH=$MPI_HOME/bin
 export PATH=$MPIPATH:$PATH
 export MPI_LIBDIR2=/lib64
 export MLK_HOME=/opt/intel/Compiler/11.1/current/mkl/lib/em64t
 . /opt/intel/Compiler/11.1/current/bin/iccvars.sh intel64
 . /opt/intel/Compiler/11.1/current/bin/ifortvars.sh intel64

 cd $AMBERHOME/AmberTools/src
 make clean
 ./configure -cuda -mpi intel
 cd ../../src
 make clean
 make cuda_parallel

Amber11 PMEMD for CUDA requires an MPI-2 implementation. Once the executable is built, run the tests. The instructions say to execute the script but this wont work on our cluster without an additional step.

This is because the MPI libraries are integrated with the cluster’s PBS/Torque scheduler so that MPI programs only work within a PBS job. The Torque qsub command provides a flag to start an interactive PBS session. I used this and also specified the GPGPU work queue, one node and 8 processes.

 [scox@br0:~/app]$ qsub -I -q gpgpu -l nodes=1:ppn=8 

Then I set the DO_PARALLEL environment variable

 [scox@gpgpu-0-0:~/app/amber11/test]$ export DO_PARALLEL='/opt/mpi/intel/mpiexec-0.84/bin/mpiexec -n 2 -comm=pmi '

and execute the tests…

 [scox@gpgpu-0-1:~/app/amber11/test]$ ./test_amber_cuda_parallel.sh 
 Using default GPU_ID = -1
 Using default PREC_MODEL = SPDP
 cd cuda && make -k test.pmemd.cuda.MPI GPU_ID=-1 PREC_MODEL=SPDP 
 make[1]: Entering directory `/home/scox/app/amber11/test/cuda'
 ------------------------------------
 Running CUDA Implicit solvent tests.
  Precision Model = SPDP
  GPU_ID = -1
 ------------------------------------
 cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod
 diffing trpcage_md.out.GPU_SPDP with trpcage_md.out
 PASSED
 ==============================================================
 cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod
 diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out
 PASSED
 ==============================================================
 cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
 diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
 PASSED
 ============================================================== .........

Next Steps

Now it’s time to figure out the best way to expose a GPGPU PBS queue through the OSG CE so that OSG jobs can target it.

Posted in Amber11, Compute Grids, Engage VO, GPGPU, High Throughput Computing (HTC), High Throughput Parallel Computing (HTPC), multicore, OSG, pmemd | Leave a comment

CampusFactory at NCSA Lincoln

I recently set up CampusFactory on NCSA Lincoln to flock jobs from the new Engage submit node.

The CampusFactory is a Condor job submitted to a personal condor instance. The job executes the Factory which is implemented as a Python script.

  • The factory queries remote Condor schedds for idle jobs.
  • It uses glite to launch glideins on Lincoln compute nodes.
  • The glideins join the schedd pool of the campus factory Condor instance.
  • Jobs flock from the submit node to the CampusFactory glideins and execute.
  • Results are transferred back to the submit node.

Jobs must be submitted in the vanilla universe so this is not a viable way to offload already submitted grid universe jobs.

Posted in Uncategorized | Leave a comment

PMEMD for OSG Stats

PMEMD for OSG is live. Gratia statistics for January:

All runs are 8-way parallel MPI jobs so we get eight hours of CPU time per hour of wall time.

Posted in Amber9, Compute Grids, condor, Engage VO, High Throughput Computing (HTC), High Throughput Parallel Computing (HTPC), multicore, OSG, pmemd, RENCI, Uncategorized | Leave a comment