There’s a group at RENCI working on Next generation Genome Sequencing technologies (NGS) and Whole Genome Sequencing (WGS) in particular. I’ve been helping them to get a new workflow executing on our Blueridge cluster.
The whole genome sequencing workflow is a multi-step process involving about a half dozen different executables. It runs on Blueridge and Kure – local clusters -but there’s strong interest in making the workfow portable for execution on other systems including the Open Science Grid.
The new users were provisioned on the submit node and DOE certificates obtained according to the usual Engage process.
Then I worked with the team to create a Pegasus DAX representing the workflow. Our main challenges involved
- Debugging: Getting used to searching the Pegasus logs for output status. Pegasus creates a directory for each workflow submission. Each job in the workflow will produce a series of separate files, one of which is an XML file containing details of the execution including fully qualified paths to executables and files. It also contains standard output and error and exit status codes of each job. These are indispensable for debugging.
- Files: Some of the files used in this workflow were pre-staged on the local cluster to allow us to get the workflow up and running. The DAX file elements needed to specify site=’local’ attributes to tell Pegasus not to try to stage the files.
- Environment: We initially had the user executing the workflow mapped to the generic engage user. Because we were adapting the workflow to use a legacy data setup, this ran into trouble with Unix file permissions. I reconfigured the user’s DN to map to their cluster local user so we could proceed.
- Pegasus and Scripts: One of the most difficult items to figure out was caused by something very simple. One of the scripts in the workflow had appropriate ownership and execute privileges configured but failed each execution. It turns out that it did not have the #!/bin/bash at the top of the script. So look out for that if you’ve exhausted other debugging avenues.
Next steps are to prepare the workflow – or components of it – for execution on OSG. This will involve
- Collecting the required input files into archives that are entirely portable – i.e. free of symbolic links and user privilege issues.
- An assessment of the size of the data to determine the best way to provision it to compute nodes.
- Assessment of the executables to see if any would benefit from a high throughput parallel (HTPC) treament.
- Selection of OSG resources appropriate for the task.
- Altering the Pegasus site configuration and process generally to work with GlideinWMS and target OSG.
Here’s an outline that may be useful for new user communities with parallel applications. It introduces the Open Science Grid, the concepts behind High Throughput Parallel Computing and discusses the process of adapting an application to that environment. It also introduces GlideinWMS as an important part of solving the resource selection problem.
Would this approach be helpful for the communities you interact with?
Our first researcher using Amber PMEMD on the OSG reports molecular dynamics are four to eight times faster on the OSG than with the infrastructure she had access to previously.
That’s for the all CPU version, i.e. without the Nvidia GPGPU support in Amber11.
Proteins are important and their structure complex. And then they move.
The way they move determines how organisms work … or fail. The shape of the protein determines its function so motion means its shape is in flux and its function dynamic. Seeing these complex molecules move means solving molecular dynamics (MD) equations for thousands of atoms.
And that’s what Amber11 PMEMD does. It solves MD equations to depict the motion of molecules taking advantage of high performance clusters, MPI, and multi-core architectures. Using these techniques, it’s able to give researchers insight into the behavior of proteins and other molecules such as Rho GTPases which initiate complex signaling pathways governing critical cellular functions.
Understanding how proteins and other molecules move is critical to understanding how organisms develop and the causes of diseases. But the computational complexity of even the smallest simulations forces researchers to
- Eliminate sections of molecules (the domain) from simulations entirely
- Use coarse grained abstractions of sections of molecules
- Simulate brief time periods where longer simulations might be more desireable
- Build programs around time slices instead of thinking about abstract workflows
Faster processing of in-silico MD simulations can show us more molecular motion sooner.
Blueridge and NCSA Lincoln both have NVIDIA Tesla S1070 General Purpose Graphics Processing Units (GPGPU). GPUs were originally for rendering graphics. But they’ve been re-purposed to address general computing problems.
The CUDA environment provides compilers, linkers, debuggers for developing GPGPU applications. The version of Amber11 PMEMD for CUDA dramatically outperforms CPU based computation methods:
Parallel Amber11 PMEMD for CUDA
I followed the instructions on the Amber site for building parallel Amber11 PMEMD for CUDA . In my environment that boiled down to the following script.
export AMBERHOME=$APP_BASE/amber11 export CUDA_HOME=/opt/cuda export MPI_HOME=/opt/mpi/intel/mvapich2-1.4 export MPIPATH=$MPI_HOME/bin export PATH=$MPIPATH:$PATH export MPI_LIBDIR2=/lib64 export MLK_HOME=/opt/intel/Compiler/11.1/current/mkl/lib/em64t . /opt/intel/Compiler/11.1/current/bin/iccvars.sh intel64 . /opt/intel/Compiler/11.1/current/bin/ifortvars.sh intel64 cd $AMBERHOME/AmberTools/src make clean ./configure -cuda -mpi intel cd ../../src make clean make cuda_parallel
Amber11 PMEMD for CUDA requires an MPI-2 implementation. Once the executable is built, run the tests. The instructions say to execute the script but this wont work on our cluster without an additional step.
This is because the MPI libraries are integrated with the cluster’s PBS/Torque scheduler so that MPI programs only work within a PBS job. The Torque qsub command provides a flag to start an interactive PBS session. I used this and also specified the GPGPU work queue, one node and 8 processes.
[scox@br0:~/app]$ qsub -I -q gpgpu -l nodes=1:ppn=8
Then I set the DO_PARALLEL environment variable
[scox@gpgpu-0-0:~/app/amber11/test]$ export DO_PARALLEL='/opt/mpi/intel/mpiexec-0.84/bin/mpiexec -n 2 -comm=pmi '
and execute the tests…
[scox@gpgpu-0-1:~/app/amber11/test]$ ./test_amber_cuda_parallel.sh Using default GPU_ID = -1 Using default PREC_MODEL = SPDP cd cuda && make -k test.pmemd.cuda.MPI GPU_ID=-1 PREC_MODEL=SPDP make: Entering directory `/home/scox/app/amber11/test/cuda' ------------------------------------ Running CUDA Implicit solvent tests. Precision Model = SPDP GPU_ID = -1 ------------------------------------ cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod diffing trpcage_md.out.GPU_SPDP with trpcage_md.out PASSED ============================================================== cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out PASSED ============================================================== cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md PASSED ============================================================== .........
Now it’s time to figure out the best way to expose a GPGPU PBS queue through the OSG CE so that OSG jobs can target it.
The CampusFactory is a Condor job submitted to a personal condor instance. The job executes the Factory which is implemented as a Python script.
- The factory queries remote Condor schedds for idle jobs.
- It uses glite to launch glideins on Lincoln compute nodes.
- The glideins join the schedd pool of the campus factory Condor instance.
- Jobs flock from the submit node to the CampusFactory glideins and execute.
- Results are transferred back to the submit node.
Jobs must be submitted in the vanilla universe so this is not a viable way to offload already submitted grid universe jobs.