Proteins are important and their structure complex. And then they move.
The way they move determines how organisms work … or fail. The shape of the protein determines its function so motion means its shape is in flux and its function dynamic. Seeing these complex molecules move means solving molecular dynamics (MD) equations for thousands of atoms.
And that’s what Amber11 PMEMD does. It solves MD equations to depict the motion of molecules taking advantage of high performance clusters, MPI, and multi-core architectures. Using these techniques, it’s able to give researchers insight into the behavior of proteins and other molecules such as Rho GTPases which initiate complex signaling pathways governing critical cellular functions.
Understanding how proteins and other molecules move is critical to understanding how organisms develop and the causes of diseases. But the computational complexity of even the smallest simulations forces researchers to
- Eliminate sections of molecules (the domain) from simulations entirely
- Use coarse grained abstractions of sections of molecules
- Simulate brief time periods where longer simulations might be more desireable
- Build programs around time slices instead of thinking about abstract workflows
Faster processing of in-silico MD simulations can show us more molecular motion sooner.
Blueridge and NCSA Lincoln both have NVIDIA Tesla S1070 General Purpose Graphics Processing Units (GPGPU). GPUs were originally for rendering graphics. But they’ve been re-purposed to address general computing problems.
The CUDA environment provides compilers, linkers, debuggers for developing GPGPU applications. The version of Amber11 PMEMD for CUDA dramatically outperforms CPU based computation methods:
Parallel Amber11 PMEMD for CUDA
I followed the instructions on the Amber site for building parallel Amber11 PMEMD for CUDA . In my environment that boiled down to the following script.
export AMBERHOME=$APP_BASE/amber11 export CUDA_HOME=/opt/cuda export MPI_HOME=/opt/mpi/intel/mvapich2-1.4 export MPIPATH=$MPI_HOME/bin export PATH=$MPIPATH:$PATH export MPI_LIBDIR2=/lib64 export MLK_HOME=/opt/intel/Compiler/11.1/current/mkl/lib/em64t . /opt/intel/Compiler/11.1/current/bin/iccvars.sh intel64 . /opt/intel/Compiler/11.1/current/bin/ifortvars.sh intel64 cd $AMBERHOME/AmberTools/src make clean ./configure -cuda -mpi intel cd ../../src make clean make cuda_parallel
Amber11 PMEMD for CUDA requires an MPI-2 implementation. Once the executable is built, run the tests. The instructions say to execute the script but this wont work on our cluster without an additional step.
This is because the MPI libraries are integrated with the cluster’s PBS/Torque scheduler so that MPI programs only work within a PBS job. The Torque qsub command provides a flag to start an interactive PBS session. I used this and also specified the GPGPU work queue, one node and 8 processes.
[scox@br0:~/app]$ qsub -I -q gpgpu -l nodes=1:ppn=8
Then I set the DO_PARALLEL environment variable
[scox@gpgpu-0-0:~/app/amber11/test]$ export DO_PARALLEL='/opt/mpi/intel/mpiexec-0.84/bin/mpiexec -n 2 -comm=pmi '
and execute the tests…
[scox@gpgpu-0-1:~/app/amber11/test]$ ./test_amber_cuda_parallel.sh Using default GPU_ID = -1 Using default PREC_MODEL = SPDP cd cuda && make -k test.pmemd.cuda.MPI GPU_ID=-1 PREC_MODEL=SPDP make: Entering directory `/home/scox/app/amber11/test/cuda' ------------------------------------ Running CUDA Implicit solvent tests. Precision Model = SPDP GPU_ID = -1 ------------------------------------ cd trpcage/ && ./Run_md_trpcage -1 SPDP netcdf.mod diffing trpcage_md.out.GPU_SPDP with trpcage_md.out PASSED ============================================================== cd myoglobin/ && ./Run_md_myoglobin -1 SPDP netcdf.mod diffing myoglobin_md.out.GPU_SPDP with myoglobin_md.out PASSED ============================================================== cd chamber/dhfr/ && ./Run.dhfr_charmm.md -1 SPDP netcdf.mod diffing mdout.dhfr_charmm_md.GPU_SPDP with mdout.dhfr_charmm_md PASSED ============================================================== .........
Now it’s time to figure out the best way to expose a GPGPU PBS queue through the OSG CE so that OSG jobs can target it.