Overhead of Data Sharing

Tom Glaser

Overhead of Data Sharing

We have a 2-way data sharing environment…each member running on a different lpar. We had an incident that brought one member down. The application noticed a batch job of theirs running much faster than normal on the other lpar and would like to understand why it ran so much faster.  This application does access the same partitions from either lpar.  Sure there’s overhead for data sharing, but twice the amount of DB2 CPU?

Normal Processing:

 TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2) 

------------ -------------- --------------

ELAPSED TIME 0:19:51.250406 0:13:10.562664      

 NONNESTED   0:19:51.250406 0:13:10.562664           

 STORPROC          0.000000       0.000000           

 UDF               0.000000      0.000000           

TRIGGER           0.000000       0.000000           

                                                        

CPU TIME     0:11:00.382992 0:07:11.255863      

 AGENT           660.382992     431.255863           

  NONNESTED      660.382992     431.255863      

  STOR PROC        0.000000       0.000000           

  UDF              0.000000       0.000000           

  TRIGGER          0.000000       0.000000           

 PARAL TSKS        0.000000       0.000000           

 IIP CPU     0:00:00.000000 0:00:00.000000           

 IIP CP CPU        0.000000                

 

DATA SHARING   TOTAL

---------------------

LOCK   REQ        26

UNLOCK REQ      1371

CHANGE REQ         0

LOCK   -XES      1768

UNLOCK -XES     2663

CHANGE -XES         0

SUSP -IRLM         0

FALSE CONT         0

Processing with one member down:

TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2) 

------------ -------------- --------------

ELAPSED TIME 0:11:44.073236 0:06:07.613364      

 NONNESTED   0:11:44.073236 0:06:07.613364           

 STORPROC          0.000000       0.000000           

 UDF               0.000000       0.000000           

 TRIGGER           0.000000       0.000000           

                                                         

CPU TIME    0:06:35.986777 0:03:00.002419      

 AGENT           395.986777     180.002418           

  NONNESTED      395.986777     180.002418      

  STOR PROC        0.000000       0.000000           

  UDF              0.000000      0.000000           

  TRIGGER          0.000000       0.000000           

 PARAL TSKS        0.000000       0.000000           

 IIP CPU     0:00:00.000000 0:00:00.000000  

 

DATA SHARING   TOTAL

---------------------

LOCK   REQ         5

UNLOCK REQ       545

CHANGE REQ         0

LOCK   -XES       713

UNLOCK -XES       950

CHANGE -XES         0

SUSP -IRLM         0

FALSE CONT         0

What can we look out at to determine the overhead associated with data sharing? The accounting reports are almost identical, but there are a few different fields under “data sharing.”   What kind of report can I show the application the overhead associated with data sharing, to show them that data sharing overhead is the cost of doing business?  If you have high availability, then you do have to pay the cost for additional CPU.  Of course, if there is something we can “tweak,” this would help all of us.

Thanks, Tom   

 

Jørn Thyssen

RE: Overhead of Data Sharing
(in response to Tom Glaser)

Hi Tom,

Can you post the numbers from the DML section as well, just to make sure that the two runs are comparable?

Best regards,

Jørn Thyssen

Works for IBM Denmark. Views are personal. 

Lockwood Lyon

Overhead of Data Sharing
(in response to Tom Glaser)
Tom,

I would gather some additional "LPAR-differences" and application information first. Data Sharing overhead has more to do with Inter-DB2 R/W Interest and elongated elapsed times due to P-Locks than it does to CPU time.

First thing to look at is the availability of zIIP engines for the LPARs. Any chance some of your workload was offloaded to zIIP(s) on one LPAR but not on the other? Are both LPARs on the same physical machine? Do they have different CPU caps?

Next is access paths. You don't say what your batch job is doing, but can we assume static SQL, and that the application wasn't ReBound between executions? Different access paths would account for the CPU difference.

Application-wise, are the two executions "exactly" the same? Same data accessed, in exactly the same order? Is the data used frequently by this or other applications, and hence resident in the VPool(s) for your "short" execution?

HTH. Good luck!

- Lock Lyon


From: Tom Glaser [mailto:[login to unmask email]
Sent: Wednesday, March 22, 2017 3:34 PM
To: [login to unmask email]
Subject: [DB2-L] - Overhead of Data Sharing

We have a 2-way data sharing environment…each member running on a different lpar. We had an incident that brought one member down. The application noticed a batch job of theirs running much faster than normal on the other lpar and would like to understand why it ran so much faster. This application does access the same partitions from either lpar. Sure there’s overhead for data sharing, but twice the amount of DB2 CPU?
Normal Processing:
TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2)
------------ -------------- --------------
ELAPSED TIME 0:19:51.250406 0:13:10.562664
NONNESTED 0:19:51.250406 0:13:10.562664
STORPROC 0.000000 0.000000
UDF 0.000000 0.000000
TRIGGER 0.000000 0.000000

CPU TIME 0:11:00.382992 0:07:11.255863
AGENT 660.382992 431.255863
NONNESTED 660.382992 431.255863
STOR PROC 0.000000 0.000000
UDF 0.000000 0.000000
TRIGGER 0.000000 0.000000
PARAL TSKS 0.000000 0.000000
IIP CPU 0:00:00.000000 0:00:00.000000
IIP CP CPU 0.000000

DATA SHARING TOTAL
---------------------
LOCK REQ 26
UNLOCK REQ 1371
CHANGE REQ 0
LOCK -XES 1768
UNLOCK -XES 2663
CHANGE -XES 0
SUSP -IRLM 0
FALSE CONT 0
Processing with one member down:
TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2)
------------ -------------- --------------
ELAPSED TIME 0:11:44.073236 0:06:07.613364
NONNESTED 0:11:44.073236 0:06:07.613364
STORPROC 0.000000 0.000000
UDF 0.000000 0.000000
TRIGGER 0.000000 0.000000

CPU TIME 0:06:35.986777 0:03:00.002419
AGENT 395.986777 180.002418
NONNESTED 395.986777 180.002418
STOR PROC 0.000000 0.000000
UDF 0.000000 0.000000
TRIGGER 0.000000 0.000000
PARAL TSKS 0.000000 0.000000
IIP CPU 0:00:00.000000 0:00:00.000000

DATA SHARING TOTAL
---------------------
LOCK REQ 5
UNLOCK REQ 545
CHANGE REQ 0
LOCK -XES 713
UNLOCK -XES 950
CHANGE -XES 0
SUSP -IRLM 0
FALSE CONT 0
What can we look out at to determine the overhead associated with data sharing? The accounting reports are almost identical, but there are a few different fields under “data sharing.” What kind of report can I show the application the overhead associated with data sharing, to show them that data sharing overhead is the cost of doing business? If you have high availability, then you do have to pay the cost for additional CPU. Of course, if there is something we can “tweak,” this would help all of us.
Thanks, Tom


-----End Original Message-----

Venkat Srinivasan

RE: Overhead of Data Sharing
(in response to Tom Glaser)

In addition to the comments in this thread, the cpu time variance due to data sharing (not merely enabling data sharing but pagesets have to be GBP dependent) the CF sync service time will present itself as cpu time in the requesting LPAR. What is your average sync response time.   

Is there a possibility you are duplexing lock structure?

Doubled cpu time is unusual. 

You should look at both acctg and statistics (during the intervals batch job was in flight) and cf activity.

Venkat

In Reply to Tom Glaser:

We have a 2-way data sharing environment…each member running on a different lpar. We had an incident that brought one member down. The application noticed a batch job of theirs running much faster than normal on the other lpar and would like to understand why it ran so much faster.  This application does access the same partitions from either lpar.  Sure there’s overhead for data sharing, but twice the amount of DB2 CPU?

Normal Processing:

 TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2) 

------------ -------------- --------------

ELAPSED TIME 0:19:51.250406 0:13:10.562664      

 NONNESTED   0:19:51.250406 0:13:10.562664           

 STORPROC          0.000000       0.000000           

 UDF               0.000000      0.000000           

TRIGGER           0.000000       0.000000           

                                                        

CPU TIME     0:11:00.382992 0:07:11.255863      

 AGENT           660.382992     431.255863           

  NONNESTED      660.382992     431.255863      

  STOR PROC        0.000000       0.000000           

  UDF              0.000000       0.000000           

  TRIGGER          0.000000       0.000000           

 PARAL TSKS        0.000000       0.000000           

 IIP CPU     0:00:00.000000 0:00:00.000000           

 IIP CP CPU        0.000000                

 

DATA SHARING   TOTAL

---------------------

LOCK   REQ        26

UNLOCK REQ      1371

CHANGE REQ         0

LOCK   -XES      1768

UNLOCK -XES     2663

CHANGE -XES         0

SUSP -IRLM         0

FALSE CONT         0

Processing with one member down:

TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2) 

------------ -------------- --------------

ELAPSED TIME 0:11:44.073236 0:06:07.613364      

 NONNESTED   0:11:44.073236 0:06:07.613364           

 STORPROC          0.000000       0.000000           

 UDF               0.000000       0.000000           

 TRIGGER           0.000000       0.000000           

                                                         

CPU TIME    0:06:35.986777 0:03:00.002419      

 AGENT           395.986777     180.002418           

  NONNESTED      395.986777     180.002418      

  STOR PROC        0.000000       0.000000           

  UDF              0.000000      0.000000           

  TRIGGER          0.000000       0.000000           

 PARAL TSKS        0.000000       0.000000           

 IIP CPU     0:00:00.000000 0:00:00.000000  

 

DATA SHARING   TOTAL

---------------------

LOCK   REQ         5

UNLOCK REQ       545

CHANGE REQ         0

LOCK   -XES       713

UNLOCK -XES       950

CHANGE -XES         0

SUSP -IRLM         0

FALSE CONT         0

What can we look out at to determine the overhead associated with data sharing? The accounting reports are almost identical, but there are a few different fields under “data sharing.”   What kind of report can I show the application the overhead associated with data sharing, to show them that data sharing overhead is the cost of doing business?  If you have high availability, then you do have to pay the cost for additional CPU.  Of course, if there is something we can “tweak,” this would help all of us.

Thanks, Tom   

 

Nguyen Duc Tuan

Overhead of Data Sharing
(in response to Venkat Srinivasan)
Hi Tom,
we have this when running a batch on one lpar rather than another one, in
our case it is due to distance.
Have a look at the Asynchronous XES converted locks (also called "heuristic
conversions") : "This conversion occurs when XES determines that it is more
efficient to send the request asynchronously to the coupling facility."

https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/dshare/src/tpc/db2z_usingdb2statstrace.html

On Thu, Mar 23, 2017 at 1:54 AM, Venkat Srinivasan <[login to unmask email]>
wrote:

> In addition to the comments in this thread, the cpu time variance due to
> data sharing (not merely enabling data sharing but pagesets have to be GBP
> dependent) the CF sync service time will present itself as cpu time in the
> requesting LPAR. What is your average sync response time.
>
> Is there a possibility you are duplexing lock structure?
>
> Doubled cpu time is unusual.
>
> You should look at both acctg and statistics (during the intervals batch
> job was in flight) and cf activity.
>
> Venkat
>
> In Reply to Tom Glaser:
>
> We have a 2-way data sharing environment…each member running on a
> different lpar. We had an incident that brought one member down. The
> application noticed a batch job of theirs running much faster than normal
> on the other lpar and would like to understand why it ran so much faster.
> This application does access the same partitions from either lpar. Sure
> there’s overhead for data sharing, but twice the amount of DB2 CPU?
>
> Normal Processing:
>
> TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2)
>
> ------------ -------------- --------------
>
> ELAPSED TIME 0:19:51.250406 0:13:10.562664
>
> NONNESTED 0:19:51.250406 0:13:10.562664
>
> STORPROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
>
>
> CPU TIME 0:11:00.382992 0:07:11.255863
>
> AGENT 660.382992 431.255863
>
> NONNESTED 660.382992 431.255863
>
> STOR PROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
> PARAL TSKS 0.000000 0.000000
>
> IIP CPU 0:00:00.000000 0:00:00.000000
>
> IIP CP CPU 0.000000
>
>
>
> DATA SHARING TOTAL
>
> ---------------------
>
> LOCK REQ 26
>
> UNLOCK REQ 1371
>
> CHANGE REQ 0
>
> LOCK -XES 1768
>
> UNLOCK -XES 2663
>
> CHANGE -XES 0
>
> SUSP -IRLM 0
>
> FALSE CONT 0
>
> Processing with one member down:
>
> TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2)
>
> ------------ -------------- --------------
>
> ELAPSED TIME 0:11:44.073236 0:06:07.613364
>
> NONNESTED 0:11:44.073236 0:06:07.613364
>
> STORPROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
>
>
> CPU TIME 0:06:35.986777 0:03:00.002419
>
> AGENT 395.986777 180.002418
>
> NONNESTED 395.986777 180.002418
>
> STOR PROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
> PARAL TSKS 0.000000 0.000000
>
> IIP CPU 0:00:00.000000 0:00:00.000000
>
>
>
> DATA SHARING TOTAL
>
> ---------------------
>
> LOCK REQ 5
>
> UNLOCK REQ 545
>
> CHANGE REQ 0
>
> LOCK -XES 713
>
> UNLOCK -XES 950
>
> CHANGE -XES 0
>
> SUSP -IRLM 0
>
> FALSE CONT 0
>
> What can we look out at to determine the overhead associated with data
> sharing? The accounting reports are almost identical, but there are a
> few different fields under “data sharing.” What kind of report can I show
> the application the overhead associated with data sharing, to show them
> that data sharing overhead is the cost of doing business? If you have high
> availability, then you do have to pay the cost for additional CPU. Of
> course, if there is something we can “tweak,” this would help all of us.
>
> Thanks, Tom
>
>
>
>
> -----End Original Message-----
>

Tom Glaser

RE: Overhead of Data Sharing
(in response to Nguyen Duc Tuan)

Jorn, Lockwood, Venkat, Nguyen,

The DML is as follows:

Shorter run, when other member was down:

SQL DML     TOTAL   SQL DCL         TOTAL

-----------------   ---------------------

SELECT   2355857   LOCK TABLE          0

INSERT         0   GRANT               0

UPDATE         0   REVOKE              0

DELETE         0   SET SQLID           0

DESCRIBE       0   SET HOST VAR   254388

DESC TBL       0   SET DEGREE          0

PREPARE         0   SET RULES           0

OPEN     10840904   SET CUR PATH        0

FETCH   16257446   SET CUR PREC        0

 

Normal long run with both members up:

SQL DML     TOTAL   SQL DCL         TOTAL

-----------------   ---------------------

SELECT   2357648   LOCK TABLE          0

INSERT         0   GRANT               0

UPDATE         0   REVOKE              0

DELETE         0   SET SQLID           0

DESCRIBE       0   SET HOST VAR   254868

DESC TBL       0   SET DEGREE          0

PREPARE         0   SET RULES           0

OPEN     10873690   SET CUR PATH        0

FETCH   16296935   SET CUR PREC        0

This is static SQL. As I mentioned earlier, this job was running fine.  When the other member came down, the job’s cpu was cut in half.  When the member on the other lpar came back up, the job went to its normal time.

Access Paths: Since the job is running on the same lpar as before, I don’t think the access paths would change.  Since this is a true 7 x 24 environment, I can’t run an explain with the other member down.   Executions are very close, every time….in the amount of data being used. 

LPARS: The lpars are on two different machines.  But, the job never moved from one member to the other, so it runs all the time on the same lpar.  When the other member came down accidentally, the job ran twice as fast.

Nguyen,

Yes we are duplexing lock structures because we have internal cfs, no external cf. We’ve decided we need to have them duplexed to reduce recovery time.  I’m rerunning stats for ssid for the time period. 

Thanks, Tom

Venkat Srinivasan

RE: Overhead of Data Sharing
(in response to Tom Glaser)

If you really meant "duplexing lock structure" then you should focus on CF activity report. The pertinent sections in acctg and stats are locking and buffer pool and group bufferpool sections.

Find the group bufferpool with the highest number of requests for gbp dep pages (from acctg) and check in cf activity (for that gbp structure) what the service time shows for the smf interval the job was in-flight. The same report should also have counters such as chagd (sync-async) and path busy etc.

Venkat 

 
Venkat


In Reply to Tom Glaser:

Jorn, Lockwood, Venkat, Nguyen,

The DML is as follows:

Shorter run, when other member was down:

SQL DML     TOTAL   SQL DCL         TOTAL

-----------------   ---------------------

SELECT   2355857   LOCK TABLE          0

INSERT         0   GRANT               0

UPDATE         0   REVOKE              0

DELETE         0   SET SQLID           0

DESCRIBE       0   SET HOST VAR   254388

DESC TBL       0   SET DEGREE          0

PREPARE         0   SET RULES           0

OPEN     10840904   SET CUR PATH        0

FETCH   16257446   SET CUR PREC        0

 

Normal long run with both members up:

SQL DML     TOTAL   SQL DCL         TOTAL

-----------------   ---------------------

SELECT   2357648   LOCK TABLE          0

INSERT         0   GRANT               0

UPDATE         0   REVOKE              0

DELETE         0   SET SQLID           0

DESCRIBE       0   SET HOST VAR   254868

DESC TBL       0   SET DEGREE          0

PREPARE         0   SET RULES           0

OPEN     10873690   SET CUR PATH        0

FETCH   16296935   SET CUR PREC        0

This is static SQL. As I mentioned earlier, this job was running fine.  When the other member came down, the job’s cpu was cut in half.  When the member on the other lpar came back up, the job went to its normal time.

Access Paths: Since the job is running on the same lpar as before, I don’t think the access paths would change.  Since this is a true 7 x 24 environment, I can’t run an explain with the other member down.   Executions are very close, every time….in the amount of data being used. 

LPARS: The lpars are on two different machines.  But, the job never moved from one member to the other, so it runs all the time on the same lpar.  When the other member came down accidentally, the job ran twice as fast.

Nguyen,

Yes we are duplexing lock structures because we have internal cfs, no external cf. We’ve decided we need to have them duplexed to reduce recovery time.  I’m rerunning stats for ssid for the time period. 

Thanks, Tom

J&#248;rn Thyssen

RE: Overhead of Data Sharing
(in response to Tom Glaser)

Hi Tom,

The two workload certainly looks comparable based on the DML section and I agree that 100% overhead is extremely large given it is a SELECT-only workload.

Hopefully the other suggestions in the thread can help you, otherwise I suggest opening a PMR with IBM Support

 

Best regards,

Jørn Thyssen

Works for IBM Denmark. Views are personal. 

Nguyen Duc Tuan

Overhead of Data Sharing
(in response to Jørn Thyssen)
Hi Tom,
Ia m agree with Venkat, CF activity should show the differences
Also the answer should be somewhere in the accounting counters.
Unfortunately, your report is not the detailed one (not all fields are
showed) . Try to get one - Otherwise, if you can send me the SMF records of
these executions, i can do the study.

Regards

Duc



On Fri, Mar 24, 2017 at 9:09 AM, Jørn Thyssen <[login to unmask email]> wrote:

> Hi Tom,
>
> The two workload certainly looks comparable based on the DML section and I
> agree that 100% overhead is extremely large given it is a SELECT-only
> workload.
>
> Hopefully the other suggestions in the thread can help you, otherwise I
> suggest opening a PMR with IBM Support
>
>
>
> Best regards,
>
> Jørn Thyssen
>
> Works for IBM Denmark. Views are personal.
>
> -----End Original Message-----
>

Lockwood Lyon

Overhead of Data Sharing
(in response to Tom Glaser)
Tom,

I believe that you are looking too closely at the SQL counts and not at things that would drive the CPU time.

[As an aside, what is this package doing?! A wild guess: (1) It reads a record from an input file of about size 2.36 M records; then, (2) Does a SELECT on some table based on a unique value from the input record; then, (3) Based on whether there is a "match" from the SELECT in (2) it OPENs a cursor on some full-select expression and FETCHes zero or one rows. But what is the SET statement doing? Hopefully it's not something stupid like SET :Host-Var = CURRENT DATE .... ]

What is CPU used for? Mostly: (1) Invoking DB2, (2) executing calculations, (3) sorting, and (4) driving I/Os. You have a lot of SQL executions (5M+ statements), so there's lots of CPU being used simply passing control to DB2 and back. Based on my assumptions about what the package is doing, I expect that at least 1M rows have been retrieved from one or more tables, so CPU got spent in driving table and/or index I/Os into the VPools. Last are calculations: have you got any built-in functions (SUBSTR, MIN, CAST, etc), calculations (e.g. SELECT SALARY+BONUS+COMMISSION), or stuff? Does your cursor contain an ORDER BY, DISTINCT, or GROUP BY? Are there table JOINs where the access path results in a Sort?

Now we come to your symptom: Why do the "same" sequence of SQL statements utilize significantly different CPU resources in the two environments? There *are* certain environmental differences that could cause this. A few guesses:

* Virtual Pool sizes in the two environments (for the DB2 objects accessed) are significantly different. If VPools are sized differently then dynamic prefetch may be affected as well as page residencies in the pool. CPU is spent driving I/Os to (re-)get pages having qualified rows.
* SortPool sizes may be different, leading to one environment executing in-memory Sorts while the other may use workfile I/O.
* During your "fast" execution one or more LPARs (and DB2s) in the SysPlex were not available, meaning that there was no Inter-DB2 Read Write Interest, leading to few (or no) calls to the CF for P-Locks etc. and therefore no CPU spent on avoiding contention.
* Different z-platform hardware has different CPUs with different speed ratings. Are your two hardware environments "identical"?
* The availability of zIIP engines in the "fast" environment may be skewing your CPU numbers. Was any work offloaded to zIIPs? (doesn't look like it from the DML, but just asking.)

The places to start looking for differences in CPU usage are the Accounting Report and/or Statistics Report. First check the buffer pool numbers for the two executions to confirm whether or not the number of GETPAGEs was comparable. This will indirectly relate to how much CPU was used to drive I/Os. Then check the workfile BP numbers in each to verify whether Sorts were done with workfiles, and if so how much.

Last, check the ZPARM summaries to see if there are any differences in the DB2 subsystem global settings.

Last last, as a favor to me, take a quick look at this application logic. Is it really doing a singleton SELECT followed by an OPEN/FETCH on other tables? It would seem to me that these two statements could be implemented with a single JOIN cursor.

Hope some of this helps.

- Lock Lyon



From: Tom Glaser [mailto:[login to unmask email]
Sent: Thursday, March 23, 2017 4:26 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: Overhead of Data Sharing

Jorn, Lockwood, Venkat, Nguyen,
The DML is as follows:
Shorter run, when other member was down:
SQL DML TOTAL SQL DCL TOTAL
----------------- ---------------------
SELECT 2355857 LOCK TABLE 0
INSERT 0 GRANT 0
UPDATE 0 REVOKE 0
DELETE 0 SET SQLID 0
DESCRIBE 0 SET HOST VAR 254388
DESC TBL 0 SET DEGREE 0
PREPARE 0 SET RULES 0
OPEN 10840904 SET CUR PATH 0
FETCH 16257446 SET CUR PREC 0

Normal long run with both members up:
SQL DML TOTAL SQL DCL TOTAL
----------------- ---------------------
SELECT 2357648 LOCK TABLE 0
INSERT 0 GRANT 0
UPDATE 0 REVOKE 0
DELETE 0 SET SQLID 0
DESCRIBE 0 SET HOST VAR 254868
DESC TBL 0 SET DEGREE 0
PREPARE 0 SET RULES 0
OPEN 10873690 SET CUR PATH 0
FETCH 16296935 SET CUR PREC 0
This is static SQL. As I mentioned earlier, this job was running fine. When the other member came down, the job’s cpu was cut in half. When the member on the other lpar came back up, the job went to its normal time.
Access Paths: Since the job is running on the same lpar as before, I don’t think the access paths would change. Since this is a true 7 x 24 environment, I can’t run an explain with the other member down. Executions are very close, every time….in the amount of data being used.
LPARS: The lpars are on two different machines. But, the job never moved from one member to the other, so it runs all the time on the same lpar. When the other member came down accidentally, the job ran twice as fast.
Nguyen,
Yes we are duplexing lock structures because we have internal cfs, no external cf. We’ve decided we need to have them duplexed to reduce recovery time. I’m rerunning stats for ssid for the time period.
Thanks, Tom

-----End Original Message-----

alain pary

RE: Overhead of Data Sharing
(in response to Tom Glaser)

What is the distance between your new Lpar and the DATA . this can influence the elapsed time need for a batch job. We have this kind of behaviour in the past our two datacenter are at 35 KM from one to the other.

Isaac Yassin

Overhead of Data Sharing
(in response to Venkat Srinivasan)
I've encountered 20%-30% difference with same workload on an active 2 way
DS and when running on 1 member only.

Isaac Yassin
Sent from my Galaxy Note 5

בתאריך 23 במרץ 2017 02:55,‏ "Venkat Srinivasan" <[login to unmask email]> כתב:

> In addition to the comments in this thread, the cpu time variance due to
> data sharing (not merely enabling data sharing but pagesets have to be GBP
> dependent) the CF sync service time will present itself as cpu time in the
> requesting LPAR. What is your average sync response time.
>
> Is there a possibility you are duplexing lock structure?
>
> Doubled cpu time is unusual.
>
> You should look at both acctg and statistics (during the intervals batch
> job was in flight) and cf activity.
>
> Venkat
>
> In Reply to Tom Glaser:
>
> We have a 2-way data sharing environment…each member running on a
> different lpar. We had an incident that brought one member down. The
> application noticed a batch job of theirs running much faster than normal
> on the other lpar and would like to understand why it ran so much faster.
> This application does access the same partitions from either lpar. Sure
> there’s overhead for data sharing, but twice the amount of DB2 CPU?
>
> Normal Processing:
>
> TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2)
>
> ------------ -------------- --------------
>
> ELAPSED TIME 0:19:51.250406 0:13:10.562664
>
> NONNESTED 0:19:51.250406 0:13:10.562664
>
> STORPROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
>
>
> CPU TIME 0:11:00.382992 0:07:11.255863
>
> AGENT 660.382992 431.255863
>
> NONNESTED 660.382992 431.255863
>
> STOR PROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
> PARAL TSKS 0.000000 0.000000
>
> IIP CPU 0:00:00.000000 0:00:00.000000
>
> IIP CP CPU 0.000000
>
>
>
> DATA SHARING TOTAL
>
> ---------------------
>
> LOCK REQ 26
>
> UNLOCK REQ 1371
>
> CHANGE REQ 0
>
> LOCK -XES 1768
>
> UNLOCK -XES 2663
>
> CHANGE -XES 0
>
> SUSP -IRLM 0
>
> FALSE CONT 0
>
> Processing with one member down:
>
> TIMES/EVENTS APPL (CLASS 1) DB2 (CLASS 2)
>
> ------------ -------------- --------------
>
> ELAPSED TIME 0:11:44.073236 0:06:07.613364
>
> NONNESTED 0:11:44.073236 0:06:07.613364
>
> STORPROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
>
>
> CPU TIME 0:06:35.986777 0:03:00.002419
>
> AGENT 395.986777 180.002418
>
> NONNESTED 395.986777 180.002418
>
> STOR PROC 0.000000 0.000000
>
> UDF 0.000000 0.000000
>
> TRIGGER 0.000000 0.000000
>
> PARAL TSKS 0.000000 0.000000
>
> IIP CPU 0:00:00.000000 0:00:00.000000
>
>
>
> DATA SHARING TOTAL
>
> ---------------------
>
> LOCK REQ 5
>
> UNLOCK REQ 545
>
> CHANGE REQ 0
>
> LOCK -XES 713
>
> UNLOCK -XES 950
>
> CHANGE -XES 0
>
> SUSP -IRLM 0
>
> FALSE CONT 0
>
> What can we look out at to determine the overhead associated with data
> sharing? The accounting reports are almost identical, but there are a
> few different fields under “data sharing.” What kind of report can I show
> the application the overhead associated with data sharing, to show them
> that data sharing overhead is the cost of doing business? If you have high
> availability, then you do have to pay the cost for additional CPU. Of
> course, if there is something we can “tweak,” this would help all of us.
>
> Thanks, Tom
>
>
>
>
> -----End Original Message-----
>

Chris Kittell

RE: Overhead of Data Sharing
(in response to Isaac Yassin)

You probably have GBPCACHE = blank in systablepart (only changed pages)...but if you have it equal to 'A' (all pages read are copied to the group bufferpool).

 

Chris Kittell

Avram Friedman

RE: Overhead of Data Sharing
(in response to Tom Glaser)

When IMS introduced locking with IMS/VS V1.0.1 which allowed for concurrent access of a database by several concurrently running requesters the over head was doubling the cost of DLI call processing.

Now it should be noted that DLI call processing is far simpler than SQL call processing.
A DLI call processes a single database record / segment usually directly located via hashing, indexes and other pointers.

So we have the cost of locking $L = cost of a single segment select without locking (in exclusive control)
The cost of a datasharing lock = $L * (systems with intent)

However DB2 developers learned from there past experience with IMS and the IMS Region Lock Manager (IRLM) and increased the scope of a cross system lock and thus reduced the number and cost

Just for grins lets say it costs .5 of $l to get a single cross system lock
So a better calculation would be $l * 0.5 (systems with intent)

This method of reasoning is classic and known as paper and pencil estimating.

 

Avram Friedman
DB2-L hall of fame contributer
DB2-L acting administrator

[login to unmask email]

Peter Backlund

Overhead of Data Sharing
(in response to Avram Friedman)
If I am not mistaken, IRLM stands (stood?) for IMS Resource Lock Manager

Best regards,

Peter - now retired in Spain

On 2017-03-31 18:45, Avram Friedman wrote:

When IMS introduced locking with IMS/VS V1.0.1 which allowed for concurrent access of a database by several concurrently running requesters the over head was doubling the cost of DLI call processing.

Now it should be noted that DLI call processing is far simpler than SQL call processing.
A DLI call processes a single database record / segment usually directly located via hashing, indexes and other pointers.

So we have the cost of locking $L = cost of a single segment select without locking (in exclusive control)
The cost of a datasharing lock = $L * (systems with intent)

However DB2 developers learned from there past experience with IMS and the IMS Region Lock Manager (IRLM) and increased the scope of a cross system lock and thus reduced the number and cost

Just for grins lets say it costs .5 of $l to get a single cross system lock
So a better calculation would be $l * 0.5 (systems with intent)

This method of reasoning is classic and known as paper and pencil estimating.


Avram Friedman
DB2-L hall of fame contributer
DB2-L acting administrator
[login to unmask email]
Site Links: View post online   View mailing list online   Start new thread via email   Unsubscribe from this mailing list   Manage your subscription  

This email has been sent to: [login to unmask email]
** ** ** Attend the 2017 IDUG Tech Conference North America ** ** **
---> Anaheim, California, April 30 - May 04, 2017 <---
http://www.idug.org/na


Use of this email content is governed by the terms of service at:
http://www.idug.org/p/cm/ld/fid=2
--

+--------------------------------+---------------------------------+
| Peter G Backlund | E-mail: [login to unmask email] |
+--------------------------------+---------------------------------+
| Phone: 0046(0)70 - 764 7554 | Skype: BacklundDB2 |
+--------------------------------+---------------------------------+
| May - September | October - April |
+------------------------------------------------------------------+
| Råsundavägen 94 | Calle Alella 55, Apt 2B |
| S-169 57 SOLNA | 03185 Torrevieja |
| Sweden | Spain |
+--------------------------------+---------------------------------+
| Playing with DB2 since 1981 ... and forever! |
+------------------------------------------------------------------+

Peter Vanroose

RE: Overhead of Data Sharing
(in response to Peter Backlund)

And now it's called the "internal resource lock manager"  ;-)

In Reply to Peter Backlund:

If I am not mistaken, IRLM stands (stood?) for IMS Resource Lock Manager 

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/

Avram Friedman

RE: Overhead of Data Sharing
(in response to Peter Vanroose)

The referece to the many official meanings of IRLM was meant by a responder to suggest that it was reasonable to discuss the overhead for sharing in IMS (the originial I but not the original ego) as an indicator for sharing overhead in DB2.

Avram Friedman
DB2-L hall of fame contributer
DB2-L acting administrator

[login to unmask email]

Tom Glaser

RE: Overhead of Data Sharing
(in response to Avram Friedman)

Just fyi...we run Internal Coupling Facilities and in a few weeks, we will turn off duplexing of the SCA & Lock structures.  I'll let everyone know how it goes.

Thanks, Tom

Venkat Srinivasan

RE: Overhead of Data Sharing
(in response to Tom Glaser)

Once you go to v12 you may want to see/explore if async duplexing for lock structure helps you.

In Reply to Tom Glaser:

Just fyi...we run Internal Coupling Facilities and in a few weeks, we will turn off duplexing of the SCA & Lock structures.  I'll let everyone know how it goes.

Thanks, Tom