How Much is Enough REAL Storage

Over the years as Db2 has traded real storage exploitation for CPU and zIIP MIPS reductions, customers have become more aggressive at utilizing these techniques. Db2 buffer pool expansion, EDM and authorization cache pools, as well as persisting threads through the use of KEEPDYNAMIC and RELEASE(DEALLOCATE) all add to the Db2’s storage footprint. Db2 12 further exploits this expanse of storage use through Fast Traverse Blocks, contiguous buffer pools, and EDM pool storage allocation techniques. As the z/OS architecture evolves to allow for larger storage specifications, so does Db2’s ability to consume it. Db2 buffer pools went from an architectural limit of 1TB to 16TBs. This was done to stay ahead of the z/OS LPAR limit on storage moving from 1TB to 4TBs with z/OS 2.2 and the z13. However, Db2 for z/OS support still see cases where customers cause inadvertent outages by over-allocating real storage. Often the issues arise with the combination of page-fixed buffer pools and large LFAREA configurations for 1MB and 2GB frames. The shortages can manifest themselves in the entire LPAR abending, so the scope goes much beyond just Db2.

Let us walk through a customer scenario, with some example storage numbers, to understand how such a storage constraint can come about. All of the following references are to REAL CENTRAL storage allocated to the LPAR, not virtual storage. The customer had just added 100GB in the LFAREA (large frame area) parameter in their IEAOPTxx Parmlib member. For simplicity sake we will assume the LPAR was originally 100GB and the customer added another 100GB in total. The customer set LFAREA to 100GB because he wanted to use all that ‘net-new’ storage for Db2 buffer pools. In order to appease the change control organization, the changes to the LPAR and Db2 buffer pools would be done piecemeal, which in the end was a mistake. The change to LFAREA was made during the IPL window over the weekend, but the Db2 buffer pools remained page-fixed, with a 4k frame size. On Monday, when the LPAR came up everything seemed fine, but as the peak processing time neared there were IRA400E messages put out by the System Resource Manager. This occurs when over 80% of the LPAR frames are non-pageable. Remember here that the Db2 buffer pools were all page fixed. The system administrators assumed, incorrectly, that the LPAR would begin paging to Auxiliary to free up some frames from the batch work that was still running. When the peak online transactions hit the Db2 buffer pools were fully allocated and the IRA401E (critical shortage of pageable frames) message was issued. Now over 90% of the LPAR’s frames could not be paged and there was a list of the largest consumers, the DBM1 address space being #1 on the list due to its buffer pool allocation. Shortly after this the LPAR abended due to a GETMAIN of storage which failed below the line in 24-bit storage. The customer brought the LPAR back up very quickly and they survived a few more days before the situation reoccurred. By then they had opened a case with IBM and been told the quickest solution was to remove the Page Fix attribute from the buffer pools or drastically lower LFAREA. Changing the LFAREA parameter requires an IPL, so that was done the following weekend. The customer was, not surprisingly, very aggravated that z/OS Real Storage Manager could not, auto-correct, or allow for inappropriate LFAREA settings.

So, let us take a look into the reason for the failure. Below is a chart to mathematically depict the ‘reserved’ storage areas on the LPAR in our example of a 100GB LPAR which received another 100GB of storage that was immediately allocated to LFAREA.

 

Online memory (GB)

100

What is allocated

LFAREA (GB)

0

User defined

QUAD (GB)

12.5

= 12.5% of LPAR

1MB PAGEABLE (GB)

12.5

= 12.5% of LPAR

RSM mapping

1.5

= ~1.5% of LPAR

4KB FRAMES (GB)

73.5

= Balance left for 4K preferred

 

Notice that something called the Quad area gets 12.5% off the bat, as well as the PLAREA or pageable LFAREA (1MB PAGEABLE). These two areas of storage are for dynamic address translation and backing non-page-fixed 1MB frames, respectively. The latter backing things like buffer pool control blocks. In the end the customer had 73.5GB left for 4k preferred frames. The term preferred refers to the type of GETMAIN request done to ascertain the frame of real storage. Page Fixed Db2 buffer pools use preferred requests, while non-preferred requests would come from swappable address spaces. Now watch the math as the customer doubles the storage on the LPAR to 200GB.

 

Online memory (GB)

200

What is allocated

LFAREA (GB)

100

User defined

QUAD (GB)

25

= 12.5% of LPAR

1MB PAGEABLE (GB)

25

= 12.5% of LPAR

RSM mapping

3

= ~1.5% of LPAR

4KB FRAMES (GB)

47

= Balance left for 4K preferred

 

Because these storage areas are fixed percentages (up to z/OS 2.2), they grow relative to the change in allocation. Now the customer only has 47GB left for preferred 4k frame requests. So, despite doubling the storage on the LPAR, when the new storage went to LFAREA, there were actually 46% less 4k frames available for Db2 and other address spaces. Since the Db2 buffer pools remained with a 4k framesize, the page fixed buffer pools all ate into this 47GB and ended up consuming all but about 30MB. Which was quickly used by stored procedure address spaces and other applications on the LPAR. The Critical Shortage of frames which brought the LPAR down occurred in 24-bit addressing range, not because Db2 requested storage there, but because those were the only frames z/OS Real Storage Manager had left to utilize. The $64,000 question was ‘why didn’t z/OS break down the LFAREA’, which was barely being used, the Quad area or even the 1MB Pageable (PLAREA) to satisfy the Db2 buffer pool requirement. The reality is that z/OS DOES break down the LFAREA when there is a shortage of frames, but once broken down these 4k frames can only be used for non-preferred requests. Hence, they could not be used to back the page fixed buffer pools. In order to avoid this issue, if the customer had moved the buffer pools to 1MB framesize (FRAMESIZE(1MB)) just after the IPL for LFAREA, then the buffer pools would have allocated in that storage area and no shortage would have been seen.

Other safety measures have been put in place in z/OS 2.3, to assist here. For instance, Real Storage Manager will be more lenient by allocating the 12.5% of PAGEABLE (PLAREA) on-demand, and views the user defined, 1MB LFAREA, as a high-water mark, also allocating it as needed. However, the 2GB LFAREA is unchanged, and can only back page fixed Db2 buffer pools with the attribute of FRAMESIZE(2G). Thus, allocating for, and utilizing LFAREA 2GB frames, should be done simultaneously by the z/OS sysprog and Db2 team. Another caveat to using 2G FRAMESIZE, is that in Db2 12 if you specify this on a buffer pool with has PGSTEAL set to NONE, the buffer pool will actually be allocated in 4k frames, NOT 2GB frames, so this can also add to a shortage of 4k frames.

Now Even though z/OS 2.2 goes out of support in September of 2020, many customers are still exposed to this exact scenario described above. For customers with LPARs larger than 320GB the Storage Critical messages (IRA400E) will not be issued until there is less than 64GB left of pageable frames, so it is not a fixed 80% any longer. It is, however, critical for capacity planning teams to realize that these other areas of storage on the LPAR must be taken into account when ensuring there are enough 4k frames available for use by the majority of all work running on the LPAR.

 

Addendum:

            If you were to try to calculate your storage available based on the scenario above using the RMF Paging report the CENTRAL STORAGE FRAMES shows TOTAL and AVAILABLE frames. If you have z/OS metrics enabled, then field QWOSLRST shows the REAL storage on the LPAR as well. However, this number includes 4k, 1MB, and 2GB frames in 4k increments. The AVAILABLE field will include unused 1MB and 2GB frames as well, despite the inability to use these for 4k preferred requests. So you would need to subtract out the LRAREA defined in 1MB (256k frames) and 2GB (524,288 4k frames). The simplest strategy would be to take the TOTAL frames from the RMF Paging report, or ask your capacity planning friend how much central storage is on the LPAR. Then subtract out the LFAREA definition, as well as 26.5% of TOTAL frames for the reserved areas. This would give you the amount of 4k preferred frames available on the LPAR. If the LPAR is not paging and the output from DISPLAY VIRTSTOR shows MAX LFAREA ALLOCATED (4K) = 0M then you are likely running safely. There is no field or direct subtraction method, other than what I have shows above to understand the FREE 4k preferred frames on an LPAR. Hence the convoluted route above. If you intend to add 4k buffers, or allow more storage for sorts then this is the storage you will be taking it out of.

 

1 Like
Recent Stories
The Importance of Db2 for z/OS Accounting Traces and Reports

Experiences with SQL PL for redesigning a GUI application

Complex SQL tuning in Db2 for z/OS