db2diag "started / completed retrieve log file S*.LOG to ${NEWLOGPATH}"

Rui Chen

db2diag "started / completed retrieve log file S*.LOG to ${NEWLOGPATH}"

Hi DB2-experts,

Got another puzzle...... We sometimes (usually once a week per DB, rarely twice a week) see a burst of messages like "started / completed retrieve log file S*.LOG to ${NEWLOGPATH}" in db2diag log file, and we always see high capture latency during the same timeframe. Our questions are:

  1. What would trigger such log retrieval messages? What's the first/last log that would be retrieved? any suggestions on how to reproduce it in dev env with relatively low traffic? how to avoid unnecessary log retrieval, which seems could increase capture side latency?

  2. impact of extended period of log retrieval activity on QRepl, specifically capture, latency. Is it by design capture program would halt during log retrieval?

A little bit more background:

Initially we thought the message is related to "online backup include logs". But on a second thought, why would logs requested by backup image take a detour going into ${NEWLOGPATH}? 

Would those retrieved log files be counted in LOGPRIMARY and LOGSECOND (active logs)? Conceptually the retrieved logs shouldn't be counted as active logs, since they are already archived, but just a little concerned if retrieved logs would exhaust disk space of ${NEWLOGPATH}, since i don't know what's the trigger events, nor the first/last retrieved logs.....

What's more interesting is, we always see high QRepl capture latency. Initially we thought Capture program initiated log retrieval, but that may not be 100% the case. We tried in testing env to warm start Capture program using lsn/maxcmtseq within log files that are already in ${LOGARCHMETH1} and no longer in ${NEWLOGPATH}, but the same log retrieval message never show up in db2diag either...... 

On the high capture latency end, IBMQREP_CAPMON reports LAST_EOL_TIME doesn't change, or changes really slowly during the same log retrieval period of time, and ROWS/TRANS_PROCESSED/PUBLISHED decreases significantly, sometimes even reaching 0. CAPTURE_IDLE almost the same as monitor_interval, suggesting capture seems totally idle. Heartbeats seem still going through (IBMQREP_CAPQMON reports MQ_BYTES 468 and MQ_MESSAGES 1 during the same period).

 

btw, we use LUW v10.5FP8 on RedHat/CentOs.

Really appreciate your input.

 

Bests,

Rui

Edited By:
Rui Chen[Organization Members] @ Feb 09, 2018 - 01:00 PM (America/Eastern)
Rui Chen[Organization Members] @ Feb 09, 2018 - 01:01 PM (America/Eastern)
Rui Chen[Organization Members] @ Feb 09, 2018 - 01:02 PM (America/Eastern)

Philip Gunning

db2diag "started / completed retrieve log file S*.LOG to ${NEWLOGPATH}"
(in response to Rui Chen)
Transaction rollback.....


> IBM Champion for Analytics
>
> Certified Information Systems Security Professional(CISSP)
>
> Certification Number 539059
>
> Certified Advanced DB2 DBA v10.5
>
> Certified Database Adminstrator, DB2 11.1
>
> IBM DB2 LUW Support Page -- https://www.ibm.com/analytics/us/en/technology/db2/db2-linux-unix-windows.html
>
> Skype: DB2LUW
>
> Twitter: DB2LUW
>
> Direct +1.610.451.5801
>
> IDUG DB2-L Hall of Fame
>
> www.philipkgunning.com
>
> IBM Business Partner
>
>
>
>
Sent from my iPhone

> On Feb 9, 2018, at 12:50 PM, Rui Chen <[login to unmask email]> wrote:
>
> Hi DB2-experts,
>
> Got another puzzle...... We sometimes (usually once a week per DB, rarely twice a week) see a burst of messages like "started / completed retrieve log file S*.LOG to ${NEWLOGPATH}" in db2diag log file, and we always see high capture latency during the same timeframe. Our questions are:
>
> What would trigger such log retrieval messages? What's the first/last log that would be retrieved? any suggestions on how to reproduce it in dev env with relatively low traffic?
>
> impact of extended period of log retrieval activity on QRepl, specifically capture, latency.
> A little bit more background:
>
> Initially we thought the message is related to "online backup include logs". But on a second thought, why would logs requested by backup image take a detour going into ${NEWLOGPATH}?
>
> Would those retrieved log files be counted in LOGPRIMARY and LOGSECOND (active logs)? Conceptually the retrieved logs shouldn't be counted as active logs, since they are already archived, but just a little concerned if retrieved logs would exhaust disk space of ${NEWLOGPATH}, since i don't know what's the trigger events, nor the first/last retrieved logs.....
>
> What's more interesting is, we always see high QRepl capture latency. Initially we thought Capture program initiated log retrieval, but that may not be 100% the case. We tried in testing env to warm start Capture program using lsn/maxcmtseq within log files that are already in ${LOGARCHMETH1} and no longer in ${NEWLOGPATH}, but the same log retrieval message never show up in db2diag either......
>
> btw, we use LUW v10.5FP8 on RedHat/CentOs.
>
> Really appreciate your input.
>
>
>
> Bests,
>
> Rui
>
>
> Site Links: View post online View mailing list online Start new thread via email Unsubscribe from this mailing list Manage your subscription
>
> This email has been sent to: [login to unmask email]
> ** ** ** IDUG DB2 Data and Analytics Technical Summit in Bengaluru, India 2018 ** ** **
> ---> Bengaluru, India, March 27, 2018 <---
> http://ibm.biz/IDUGBengaluru2018
>
> Use of this email content is governed by the terms of service at:
> http://www.idug.org/p/cm/ld/fid=2
>

Rui Chen

RE: db2diag "started / completed retrieve log file S*.LOG to ${NEWLOGPATH}"
(in response to Philip Gunning)

Thanks for the helpful hint Phillip!

I understand rollbak could trigger log retrieval (haven't been able to reproduce it, only read from different places....) into active log dir, but still feel like i'm missing something here. If a TX is not committed/rollback (aka. still active), shouldn't the log files be kept in active dir in the first place?  i tried many combinations of LOGPRIMARY/LOGSECOND in testing env, and saw log files generated during a long running TX did get archived, but the same log files always remain in active log dir, until db2 hit (LOGPRIMARY+LOGSECOND) limit, or i lose patience and manually rollback/commit..... Really appreciate it if you could point me to some reference explaining when archived logs in active log dir would be renamed/purged. 

As for the high QCapture latency potentially caused by long-running-TX rollback, i could wild-guess it's because QCapture program was also waiting for the log retrieval, but would be nice if there are some references that can confirm it / explain bit more details .....  

One last thought (for now), assuming long-running-TX was the culprit here, would NUM_LOG_SPAN/MAX_LOG help to prevent the same from happening? 

Thanks again for your help!

 

Bests,

Rui