Amber11 – PMEMD for NVIDIA GPGPU

Molecular Dynamics

Proteins are important and their structure complex.  And then they move.

The way they move determines how organisms work … or fail. The shape of the protein determines its function so motion means its shape is in flux and its function dynamic. Seeing these complex molecules move means solving molecular dynamics (MD) equations for thousands of atoms.

And that’s what Amber11 PMEMD does. It solves MD equations to depict the motion of molecules taking advantage of high performance clusters, MPI, and multi-core architectures. Using these techniques, it’s able to give researchers insight into the behavior of proteins and other molecules such as Rho GTPases which initiate complex signaling pathways governing critical cellular functions.

Understanding how proteins and other molecules move is critical to understanding how organisms develop and the causes of diseases. But the computational complexity of even the smallest simulations forces researchers to

  • Eliminate sections of molecules (the domain) from simulations entirely
  • Use coarse grained abstractions of sections of molecules
  • Simulate brief time periods where longer simulations might be more desireable
  • Build programs around time slices instead of thinking about abstract workflows

Faster processing of in-silico MD simulations can show us more molecular motion sooner.

NVIDIA GPGPU

Blueridge and NCSA Lincoln both have NVIDIA Tesla S1070 General Purpose Graphics Processing Units (GPGPU). GPUs were originally for rendering graphics. But they’ve been re-purposed to address general computing problems.

CUDA

The CUDA environment provides compilers, linkers, debuggers for developing GPGPU applications. The version of Amber11 PMEMD for CUDA dramatically outperforms CPU based computation methods:

Parallel Amber11 PMEMD for CUDA

I followed the instructions on the Amber site for building parallel Amber11 PMEMD for CUDA . In my environment that boiled down to the following script.

   export AMBERHOME=$APP_BASE/amber11
 export CUDA_HOME=/opt/cuda
 export MPI_HOME=/opt/mpi/intel/mvapich2-1.4
 export MPIPATH=$MPI_HOME/bin
 export PATH=$MPIPATH:$PATH
 export MPI_LIBDIR2=/lib64
 export MLK_HOME=/opt/intel/Compiler/11.1/current/mkl/lib/em64t
 . /opt/intel/Compiler/11.1/current/bin/iccvars.sh intel64
 . /opt/intel/Compiler/11.1/current/bin/ifortvars.sh intel64

 cd $AMBERHOME/AmberTools/src
 make clean
 ./configure -cuda -mpi intel
 cd ../../src
 make clean
 make cuda_parallel

Amber11 PMEMD for CUDA requires an MPI-2 implementation. Once the executable is built, run the tests. The instructions say to execute the script but this wont work on our cluster without an additional step.

This is because the MPI libraries are integrated with the cluster’s PBS/Torque scheduler so that MPI programs only work within a PBS job. The Torque qsub command provides a flag to start an interactive PBS session. I used this and also specified the GPGPU work queue, one node and 8 processes.

 [scox@br0:~/app]$ qsub -I -q gpgpu -l nodes=1:ppn=8 

Then I set the DO_PARALLEL environment variable

 [scox@gpgpu-0-0:~/app/amber11/test]$ export DO_PARALLEL='/opt/mpi/intel/mpiexec-0.84/bin/mpiexec -n 2 -comm=pmi '

and execute the tests…

 [scox@gpgpu-0-1:~/app/amber11/test]$ ./test_amber_cuda_parallel.sh 
 Using default GPU_ID = -1
 Using default PREC_MODEL = SPDP
 cd cuda && make -k test.pmemd.cuda.MPI GPU_ID=-1 PREC_MODEL=SPDP 
 make[1]: Entering directory `/home/scox/app/amber11/test/cuda'
 ------------------------------------
 Running CUDA Implicit solvent tests.
  Precision Model = SPDP
  GPU_ID = -1
 ------------------------------------
 cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod
 diffing trpcage_md.out.GPU_SPDP with trpcage_md.out
 PASSED
 ==============================================================
 cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod
 diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out
 PASSED
 ==============================================================
 cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod
 diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md
 PASSED
 ============================================================== .........

Next Steps

Now it’s time to figure out the best way to expose a GPGPU PBS queue through the OSG CE so that OSG jobs can target it.

This entry was posted in Amber11, Compute Grids, Engage VO, GPGPU, High Throughput Computing (HTC), High Throughput Parallel Computing (HTPC), multicore, OSG, pmemd. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s