I am looking at some basic workload management control and monitoring issues on a DB2 pureScale setup. Is there anything unique or different about WLM on pureScale? How does it work?
First off, I assume you are on at least DB2 10.1 which is when the full set of features that are part of DB2 workload management became available in the pureScale environment. All aspects of workload management are available and functional in the DB2 pureScale environment.
The one nuanced difference that you need to be aware of is in the definition and semantics of the activity concurrency threshold, CONCURRENTDBCOORDACTIVITIES. Unlike in other environments, in a pureScale world, this threshold controls the overall concurrency level of application activities at each individual member independently of the other members although each member still uses the same limit value. So, instead of controlling how many concurrenct activities can run on the database system as a whole, in pureScale, this threshold controls how many can run concurrently at each member.
While this may sound like a dramatic difference in behaviour, the practical effect is the same as the fact is that, in a pureScale environment, a query runs only at the member where the connection exists; compare this to a non-pureScale environment such as DB2 DPF where the same query can have its work running across all the data members as well as the administration member at the same time. In both cases, this concurrency threshold is imposing a shared concurrency limit for application activities executing against the common resource “pool” shared by the competing applications. In DB2 pureScale, the resource pool that is being fought over is at each member.
From an overall WLM configuration perspective, the pureScale environment is effectively a series of "independent" members operating against shared data so the workload management paradigm for pureScale is that each member is using an identical copy of the DB2 workload management configuration but each operate independently of each other.
In addition to the basic workload management configuration options, another important tool to keep in mind when approaching workload management on DB2 pureScale is the availability of workload balancing (WLB) to help control and guide incoming connections across the available members. The member subset capability can be used to prevent individual members from being swamped from a spike in connections or to act as a natural bottleneck for lower priority work by restricting them to a subset of members.
The act of configuring WLM remains the same as always and you continue to focus on the key domains of workload management:
Getting insight on who is connecting, what are they submitting, what is running, what queries are touching, etc
Monitoring of who is using resource and control on who is given resource
Determining normal and abnormal behaviours for different sets of work and, if needed, controlling this behaviour
To keep things simple, you can start by choosing one member to act as the representative member for the initial monitoring and configuration activities when setting up an initial workload management configuration. Once this is done, you then expand monitoring to all members to ensure that the chosen configuration is suitable across the database.
If you have subsets of workload being processed on different members (e.g. workload A runs on members 1-3 and workload B runs on members 4-6), I would recommend keeping them in separate workloads and service classes such that the different subsets are effectively two separate configurations to allow you to manipulate them independently. Of course, this also means that you should perform the initial configuration and tuning exercise on a representative member from each set so that the overall work is properly represented in the final result.
Once finalized, then ongoing monitoring of the overall WLM configuration is done at the database level across all members. For ideas on what this might entail, see the chapter entitled "Monitoring: Maintaining a stable stage 2 configuration" in the DB2 workload management best practices document for warehouses (Implementing DB2® Workload Management in a Data Warehouse).
The final thing I would say on this topic is that since the workloads on pureScale are largely homogenous compared to warehouse workloads, the typical workload management configuration might follow one of these two paradigms:
A paradigm where work gets lowered in priority the longer it runs and lower priority work gets less resource.
All work is treated equally from the start but the longer something runs, the less resource it gets.
A paradigm where a specific amount of resource is put aside for each different set of work based on its inherent (initial) priority.
The priority of work is known when it enters the system and, upon entry, it is guided to a set of pre-defined resources allocated for that priority where it will execute until completed.
This has also been referred to as a "pipeline" or "highway" approach where the size of the "pipeline" or the number of lanes in the "highway" (both of which mean how much resource is given) are determined based on the business priority of the work going through it.
I do have some thoughts on best practices for homogenous workloads which goes into a bit more detail on these paradigms that I will try to write up in a future blog entry for exposure and feedback... but that's for another day so I will leave you in suspense for now :)