[q replication] how to reveal buffered TXs by QApply program browser thread and the dependency graph

Rui Chen

[q replication] how to reveal buffered TXs by QApply program browser thread and the dependency graph

Happy New Year in advance everyone! Have a question here to add more excitement to this holiday season: how to reveal buffered TXs on QRepl apply side, and reveal dependencies graph across buffered TXs.

A little bit background:

  1. we have a batchy application runs once a while during weekend.
  2. On capture side:
    *.ibmqrep_capmon reports:
        sub-second capture latency;
        max CURRENT_LIMIT under 25% of MEMORY_LIMIT (1 GB)
    no abnormal trace reported in *.ibmqrep_captrace;
  3. on apply side, we had:
    huge number of KEY/UNIQ_DEPENDENCIES increase;
    increasing QLATENCY, and relatively small APPLY_LATENCY/RDMS_TIME;
    browser thread buffers TXs instead of dispatching TXs to idle apply_agents. we have 10 apply_agents sitting idle almost always during the high latency period. We guestimated this conclusion because APPLY_SLEEP_TIME=num_apply_agents * monitor_interval.
    0 monster TX reported during high latency period. We use max MEMOERY_LIMIST (2GB), and max CURRENT_LIMIT reached 700MB to buffer dependent TXs.
    nothing helpful reported in QReplTrace table, exception table, db2diag, nor QApply program log.....

Based on the above observations, our current suspicion is that we have a chain of dependent TXs, and we are trying to (dis-)approve it. We could try to reproduce this high latency period and use utilities like qload to reveal what's in the recvQ, which is a quite expensive operation. Alternatively, we are trying to see if it's possible to take a snapshot of what's buffered by browser threadand to reveal overlapping rows different TXs depend on

 

Also we found this patent seemingly explaining how QApply program works, especially the dependency part. Can anyone help to identify if this is the patent backing QApply program? Also curious, what's the patent backing QCapture program? trying to understand how exactly long running TX could introduce high capture latency.

 

btw, we use db2 luw v10.5FP8, and ARCH_LEVEL 1021 on both capture and apply sides.

Thanks for your help!

Edited By:
Rui Chen[Organization Members] @ Dec 26, 2017 - 02:31 PM (America/Eastern)
Rui Chen[Organization Members] @ Dec 26, 2017 - 02:31 PM (America/Eastern)
Rui Chen[Organization Members] @ Dec 26, 2017 - 05:03 PM (America/Eastern)

Jørn Thyssen

RE: [q replication] how to reveal buffered TXs by QApply program browser thread and the dependency graph
(in response to Rui Chen)

Hi,

Do you really need 10 apply agents? You might only need one in which case all the dependency logic goes away

See: https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.swg.im.iis.repl.qtune.doc/topics/iiyrqtunrepap1.html 

Otherwise I suggest opening a PMR with IBM. 

Best regards,

Jørn Thyssen

Rocket Software
77 Fourth Avenue • Waltham, MA • 02451 • USA
E: [login to unmask email] • W: www.rocketsoftware.com 

Views are personal. 

Rui Chen

RE: [q replication] how to reveal buffered TXs by QApply program browser thread and the dependency graph
(in response to Jørn Thyssen)

Hi Jørn,

Absolutely thanks a ton for the suggestion! Took a look in our APPLY_SLEEP_TIME history, one apply agent actually would have been able to handle the workload most of the time. 

Will report back if it actually helped, if we decide to lower this number.

 

Bests,

Rui

Rui Chen

RE: [q replication] how to reveal buffered TXs by QApply program browser thread and the dependency graph
(in response to Jørn Thyssen)

Hi Jørn,

Thanks for your suggestion. By reducing apply agent number to 1, our QRepl Apply side was able to move forward by serializing all TXs. Lucky for us, the TX volume wasn't too high so a single apply agent was enough during this high QRepl latency period. 

It's probably not a surprise to everyone, but i just realized browser thread checks not only the init and final value of TX involving unique constraint, but also all the transient updates inside the TX.