In-flight DBATs

Bharath Nunepalli

In-flight DBATs

We had a Production issue, caused due to hung DBATs.

Application team killed a job after running over 2 hours. DBAT remained active even after the job got killed though.

I tried to kill the thread, but that didn’t work. Finally, Systems DBA restarted the DB2 instance and that cleared the thread (shown as inflight in MSTR log).

We are trying to understand the reason for these hung threads. I attached the DBAT zparms values we are using (if that helps).

 

Can someone please provide some insight into this? Also, is there any Red Book or IBM online resource that explains the DB2 restart process?

 

Thanks.

Attachments

  • DBAT zparms.txt (2.1k)

Avram Friedman

RE: In-flight DBATs
(in response to Bharath Nunepalli)

The word KILL raises alarms.
What do you mean by that?

KILL usually suggests get rid of at all costs
A willingness to IPL if things don't work.

The authority to KILL is highly restricted in most shops.
Even in James Bond movies the license to kill is restricted to no more than 10.

Avram Friedman
DB2-L hall of fame contributer
DB2-L acting administrator

[login to unmask email]

Bharath Nunepalli

RE: In-flight DBATs
(in response to Avram Friedman)

Kill means issuing -CAN THREAD(thread number) command

Venkat Srinivasan

RE: In-flight DBATs
(in response to Bharath Nunepalli)

The admin guide documents what happens during restart after termination. It is probably too late to guess. It is likely that it was rolling back.

During restart threads identified in in-flight must be backed out to the last commit. What you say as 'hung' may be a delay in backout processing involving tape mounts / recall of migrated datasets etc. Sometimes code defects can cause hang wait scenario.

Before forcing the system if you had a dump, then that would be helpful for support to get to root cause.

The zparms of relevance that control the backout processing will be blackout and backodur. Both are explained neatly in installation guide.

Venkat
 
In Reply to Bharath Nunepalli:

We had a Production issue, caused due to hung DBATs.

Application team killed a job after running over 2 hours. DBAT remained active even after the job got killed though.

I tried to kill the thread, but that didn’t work. Finally, Systems DBA restarted the DB2 instance and that cleared the thread (shown as inflight in MSTR log).

We are trying to understand the reason for these hung threads. I attached the DBAT zparms values we are using (if that helps).

 

Can someone please provide some insight into this? Also, is there any Red Book or IBM online resource that explains the DB2 restart process?

 

Thanks.