Grayson
I mentioned in a recent post about running NAMD on NERSC’s Dirac GPU cluster there would be a discussion of how we’re managing hierarchical workflows. To do that, we need to introduce Grayson.
Grayson is a workflow design and execution environment for the OSG and XSEDE. It applies principles of model driven architecture to high throughput computing in response to challenges to the design, execution, troubleshooting and reuse of workflows we’ve seen researcher’s encounter. From a design perspective, Grayson borrows ideas and approaches from pioneering tools like Kepler, Taverna and Triana while aggressively reusing HTC infrastructure including Pegasus, GlideinWMS, Condor and Globus to create an environment addressing the user community and challenges the OSG Engage Virtual Organization serves.
Workflow
For our purposes, a workflow is the coordination plan for a set of computational jobs in a high-throughput computing environment used to answer a scientific question. Typically, workflows choreograph the interactions of a large number of programs and their input and output files. The programs are often written in different languages and may require substantial infrastructure support to facilitate their execution at a remote host. For example, we stage in MPI support executables and libraries to run many programs. Workflows in a high-throughput computing setting transfer inputs and executables to the compute host, execute the job and transfer outputs back to the submit host.
Challenges
In practice, workflows become enormously complex to define, extend, comprehend and target to emerging architectures. The Engage Virtual Organization has been working to broaden the reach of non-High Energy Physics researchers to the Open Science Grid for over four years. In that time we’ve seen a series of challenges researchers face to using the infrastructure to its full potential:
- Hardware Architecture Heterogeneity: A single workflow may have components requiring widely divergent hardware architectures. This is increasingly true as emerging parallel architectures like GPUs and multicore CPUs become pervasive in HPC and HTPC computing. Some jobs will require single core CPUs, others will require multicore CPUs and others will require GPUs. It is desirable, from a performance perspective to have highly granular control over the selection of hardware for the execution of a job. But it is difficult and undesirable from configuration management and reuse perspectives to have what is conceptually a single and highly abstract process cluttered with and tightly coupled to this level of detail. Further, when the same workflow can usefully be targeted at multiple heterogeneous hardware configurations, the problem is no longer practical to manage manually and requires tools and approaches to facilitate this increasingly vital usage pattern.
- Conceptual Complexity: In other sub-fields of software development design patterns have played a key role in allowing groups to work on large, complex software systems by creating and integrating separately developed and tested components. These components, or modules enable teams to decompose large problems. Design patterns also contribute key semantic models which serve as common conceptual tools as well as a framework for others to understand complex systems. Examples include the Chain of Responsibility pattern, Inversion of Control and Aspect Oriented Programming. Tools for the conceptual composition of large workflows are sorely needed. Currently, teams of scientists have few tools that support a practical model of component oriented development or reuse. Those tools that do exist to support these capabilities are rarely used in the context of high throughput computing with the largest infrastructures such as the Open Science Grid (OSG).
- Environmental Complexity: In general, the software stack required to use and debug high throughput computing workflows is enormously complex. The skills, patterns and habits required to effectively troubleshoot workflows is also very different than that of most researchers. Tools are needed that open large scale computational infrastructures to practical usage by non-expert research communities. Researchers shouldn’t need to develop expertise in Condor, GlideinWMS, Pegasus and Globus to design, run and debug a workflow.
- Reusability: Few existing workflow systems focus on a practical pattern of component reusability. Software development in general has thrived by making component orientation the basis of all development. Separate compilation is taken for granted in all major programming languages. That is, it must be possible to compose a program from separate files. The files must be able to reference other files and to reference symbols and constructs in those separately compiled units. It’s not hard to see how the absence of this capability would cripple productivity. Existing approaches to developing high-throughput computing workflows emphasize the creation of large scripts with little thought to modularity or reuse. They are often tightly coupled to a cluster, batch scheduling system or other infrastructure component. Approaches are needed that encourage researchers to isolate the smallest reusable modules of their workflows and define these in a way that makes reuse within and beyond the originating team the most natural approach.
- Execution Control and Analysis: Once a workflow is launched and executing it can spawn thousands of jobs. These jobs, especially during development, will fail. The process of discovering the specific cause of the failure of a job within a large workflow is tedious and error prone in proportion to the complexity of the workflow. Large directory hierarchies are manually navigated to find the appropriate log file from which error information must be extracted. As workflows become more complex – for example hierarchical workflows containing serial and parallel steps on multiple hardware architectures – it becomes prohibitively expsensive to debug and analyze these systems. Tools are needed to provide visualization of the execution, to mine data to present answers regarding problems directly to the researcher, provide visual analytic tools and provide rapid navigation for the researcher to artifacts enabling in depth research.
Submit Host
The submit host is a machine managed by an OSG virtual organization (VO) which lets VO members submit computation workloads to OSG resources. In general a submit host – and these are true of the Engage Submit Host – will run
- Condor: The scheduler coordinates submission of jobs via Globus to compute sites. Condor also provides the DAGMAN workfow planner which lets users model job precedence and retry semantics.
- Globus: Globus is used as the job submission protocol between Condor and OSG compute elements (CE)
- GlideinWMS: GlideinWMS is the OSG standard pilot-based job submission engine. It submits glideins to compute nodes which then consume user job workloads efficiently.
- Pegasus WMS: Pegasus provides a layer of abstraction above grid services allowing researchers to model workflows as abstract processes independent of underlying computational infrastructure.
Usage Models
GlideinWMS
In conventional use, researchers log in to the submit host via SSH and execute shell scripts to submit jobs. These scripts
- Ensure the user has a valid grid proxy and prompts to create one if not
- Generate a DAGMAN submit file specifying the workflow. This is shell code for doing variable replacement in static text to generate a file.
- Generate Condor submit files for each job in the DAG. This is more shell script to generate text.
- Coordinate the locations of input, output and log files to known locations
- Specify wrapper scripts to be run at the compute node which, in turn, will
- Stage in executables
- Stage in input files
- Manually manage and track program execution status
- Stage output files back to the submit host
- Submit the DAG to Condor
Workflow progress can be followed by tailing the log file for the Condor DAG.
If an error occurs during execution, the user searches the log files for causes.
Pegasus
Pegasus reads an XML specification of the workflow and generates a number of artifacts to do the majority of the steps outlined in the approach above. Specifically, it generates a DAGMAN, submits it to Condor, stages inputs and executables to compute nodes and stages outputs back to the submit host. It also provides tools for monitoring the workflow as well as some additional command line support for determining reasons for a workflow’s failures. Generating Pegasus workflows involves writing, for complex hierarchical workflows, a substantial amount of code in Python, Java or Perl to create the DAX, site, replica and transformation catalogs. As such, the user must have substantial knowledge of the programming language and each of these formats.
Grayson Architecture
At the highest level, Grayson consists of three components:
- Editor: The yEd graph editor from yWorks is used for drawing and annotating workflow graphs.
- Compiler: The Grayson compiler enables a range of features to facilitate the component oriented development of complex hierarchical workflows. Some of these include:
- Separate compilation and reference
- Functional Programming (FP)
- Aspect Oriented Programming (AOP)
- Generates all code required for a Pegasus workflow (DAX & catalogs)
- Execution Environment: The execution environment is a web based graphical interface using HTML5 capabilities including asynchronous event notification and the Scalable Vector Graphics (SVG) protocol to render a visual model of an executing workflow enhanced with live or historical execution events.
Architecturally, Grayson is a distributed architecture using AMQP and node.js for asynchronous event distribution in addition to Apache with ModWSGI and Django for conventional web tier services.
It interfaces to Pegasus by generating Pegasus DAX workflows as well as site, replica and transformation catalogs. It submits these to Pegasus for execution. It then monitors execution and reports events to the user interface via the AMQP/Node.js event pipeline.
On the client, Grayson uses JQuery and HTML5 features through RaphaelJS and Socket.IO to provide a rich and highly interactive user experience.
![grayson-architecture](https://osglog.wordpress.com/wp-content/uploads/2011/08/grayson-architecture.png?w=640&h=456)
Figure 1 : Grayson Architecture
Usage
Users define workflow components using yEd. Each component is drawn as a graph and annotated with JSON (JavaScript Object Notation). Notations tell Grayson the type of object each graph node represents. External models can be imported and their elements referenced from other graphs.
![yEd-scan-uber](https://osglog.wordpress.com/wp-content/uploads/2011/08/yed-scan-uber.png?w=640&h=522)
Figure 2 : Grayson Editor – yEd
Models are packaged using the Grayson compiler. The compiler detects the locations of all referenced models and archives everything needed by the execution environment into a compressed archive. The user uploads the archive to the execution environment which compiles the model into artifacts appropriate for Pegasus and submits the workflow. From there, execution is controlled by Condor and DAGMAN as in a normal Pegasus workflow execution.
Events resulting from the execution are detected by the Celery asynchronous task queue and transmitted to subscribed clients first as AMQP messages then, by the Node.js Socket.IO pipeline to the WebSocket client. Event notifications are mapped to graphical elements to create a dynamic, visual execution monitoring and debugging experience. Job execution as well as file transfer status are all visually represented.
![grayson-lambda-uber-execution](https://osglog.wordpress.com/wp-content/uploads/2011/08/grayson-lambda-uber-execution1.png?w=614&h=884)
Figure 3 : Grayson Execution Environment – Lambda Uber
Much of the power and significance of Grayson stems from its use of functional and aspect oriented programming to allow researchers to create and effectively execute complex hierarchical workflow. In the case of the example below, the lambda-flow sub-workflow has been targeted by a chain of map operators. The map operators recursively create instances of their target workflow according the operator’s annotations. As a result, in the executing instance below, there are 21 total instances of lambda-flow.
The expression of this multiplicity in the model is succinct, manageable and informative. Similarly, the visual representation in the execution environment condenses the multiplicity into a manageable display. Users can see the grid of workflow instances by hovering over a sub-worfklow and the selected workflow is highlighted. As a result, display performance and visual complexity is not on the order of the actual size of the workflow but on the order of it’s generalized structure which, in general, will be orders of magnitude smaller.
![grayson-lambda-uber-execution-2](https://osglog.wordpress.com/wp-content/uploads/2011/08/grayson-lambda-uber-execution-21.png?w=614&h=884)
Figure 4 : Grayson Execution Environment – Lambda Uber – Hierarchical
Once a sub-workflow has been selected the user may select the workflow node to navigate to the tab containing the graph of that sub-workflow. All events pertaining to the selected instance will be rendered onto the graph so that the view is always contextually driven by the user’s selection.
Selecting a job in the workflow renders a dialog containing detailed information about the job’s execution including:
- General information like the job’s execution host, duration and exit status
- Standard error and output
- The Pegasus workflow log
- The Condor DAGMAN log the job was a part of
A detail view provides extensive information about the workflow’s execution.
![grayson-lambda-uber-execution-3](https://osglog.wordpress.com/wp-content/uploads/2011/08/grayson-lambda-uber-execution-31.png?w=640&h=536)
Figure 5 : Grayson Execution Environment – Lambda Uber – Hierarchical Drill-down
The detail view renders a tabbed view of a variety of additional statistics of interest. One tab contains a tabular view of execution events such as job submission, execution and exit status. The transfer view renders a tabular view of each transfer event with assorted statistics. The monitor view shows the output of pegasus-status for the workflow. Finally, the files tab provides a hierarchical file browser targeted at the root of the workflow execution directory hierarchy.
Next Steps
Grayson is in alpha. Many things will likely change soon. To date, it has been used to compose a few moderately complex sample applications and one workflow for an Engage researcher. That workflow addresses interesting challenges of High Throughput Parallel Computing (HTPC) and hierarchical workflow in general which we’ll discuss more later.
Some major next steps involve migrating Grayson from Pegasus 3.0.2 to 3.1.0. The latest version adds improvements to Pegasus STAMPEDE which promise to greatly improve what is currently the weakest link in Grayson – the ability to rigorously and programmatically scan a workflow for events. STAMPEDE presents a Python (among others) interface to a relational database schema of events for a workflow. Migrating to this will eliminate ad hoc methods improvised to work against 3.0.2.
Significant progress has been made on the Continuous Integration environment for Grayson. A battery of automated tests ensure that the compiler is generating expected outputs for a given set of inputs. It also uses a headless WebKit browser to execute a variety of automated user interface tests. In addition to testing, documentation, packaging and code coverage are all automated. Still, this area is key to quality and development scalability and will be a place we invest on an ongoing basis.
Another need is for a product management system. This will track features, existing and future, serving as a guide for development, quality assurance, documentation and support. Coming soon.
There’s much more to say about Grayson but this will serve as an introduction.