Continuous diagnostic data collection and archival

By Joe Geller posted 8 days ago

  

Continuous Diagnostic Data Collection and Archival – Introducing db2histmon Historical Monitoring


- David Sciaraffa, Software Engineering Manager – IBM Db2

When a problem arises in your Db2 stack, the availability of diagnostic information before, during, and after the time-frame of the problematic event is often paramount to successful diagnosis. When the diagnostic messages in the db2diag.log or notify log are insufficient, it is not uncommon for IBM support to provide instructions to enable a specific set of diagnostic data collection, and wait for a re-occurrence of the problem, in order to begin narrowing the root cause. 

The new Db2 Historical Monitoring (db2histmon) scripts (available here) are a successor to the older Db2 Persistent Diagnostic Data Collection scripts, and similarly collect a broad range of Db2 and Operating System diagnostic data, and retain this data for some period of time, allowing for the initial triage of many types of issues.

Diagnostic data is collected at various intervals (ranging from 1min to 1hour, depending on the kind of data), and can be easily customized and added to. The scripts will collect data only while the database is activated, which can help to reduce the generation of unnecessary data.

By enabling the collection of Db2 Historical Monitoring (db2histmon), you can improve the odds that helpful diagnostic information is available after the first occurrence of a problem.

This blog provides an overview of how to enable and customize the new Db2 Historical Monitoring (db2histmon) data collection, and also provides an overview of a basic report generation script.

 
What Data is Collected?

Later in this blog, I’ll describe how to customize the data collection by adding or removing whatever diagnostic data your prefer.
By default the following data (as defined in the task_details.json file, described later) is collected:


Downloading the Db2 Historical Monitoring (db2histmon) scripts:

The scripts are available and can be downloaded from the open-source Github repo: https://github.com/IBM/db2histmon

Pre-requisites:

  • The scripts can be deployed in any Unix or Windows environment where Db2 is supported.
  • Python2or higher, as well as the python ibm_db module are required to execute the scripts.
  • The scripts can be deployed on any relatively modern version of Db2 (Version 10.1, 10.5, 11.1, 11.5). 
  • Db2 DBADM authority is required on the database where the historical monitoring will be deployed.
  • A C++ compiler (for example g++ on Unix or Visual Studio on Windows) is required (in order for the scripts to use bldrtn to compile the external C++ UDFs which are used by the historical monitoring framework).


Overview of the Db2 Historical Monitoring framework:

The historical monitoring scripts will utilize a set of control tables, procedures and external UDFs to operate, with the DB2 Admin Task Scheduler executing the data collection tasks at the desired frequency (minimum 1min), and will generate diagnostic/monitoring output into the active data collection folder first, and then archived into the data archival folder every hour.

Consider the following high-level process flow diagram:



Setup Folder:

After downloading the scripts to your local database server system, you'll notice three folders, 1_setup2_loader, 3_quickparser, and 4_report.

     (db2inst1@hotellnx112) /home/db2inst1/db2histmon-master/

     $ ls

     1_setup/  
     2_loader/   
     3_quickparser/   
     4_report/


Let’s focus on the 1_setup/ folder (which is used to setup and deploy the historical monitoring framework and begin data collection). In a subsequent blog I will provide an overview of the 2_loader script (which is used to easily load the data collections into a database so it can be queried), and the 3_quickparser script (which is used to display the data collections to a terminal window in a columnar format for easier viewing).  Later in this blog I’ll provide an overview of the 4_report script (which is used to generate a summary report of key metrics within a data collection set).

Within the 1_setup/ folder, we find three files, and sub-folder named sql/

     (db2inst1@hotellnx112) /home/db2inst1/db2histmon-master/1_setup/
     $ ls
     README.md  setup.py  sql/  task_details.json
    • The mdfile contains a technical description of the setup scripts, input options, as well as a brief architectural overview of how the scripts work. The purpose of this blog is to expand on this information with some visual examples.
    • The pyfile is the main setup script, described in detail below.
    • The jsonfile, described in detail below, contains the definition of each data collection task, and is used by the setup.py script to deploy each task.


Within the 1_setup/sql/ folder, we find the definition of the control tables, procedures, and user-defined functions which the setup.py script will create in the specified database. The details of these files would mostly be useful to developers or administrators who wish to explore and expand on the framework. I do not explore them in detail in this blog.

     (db2inst1@hotellnx112) /home/db2inst1/db2histmon-master/1_setup/sql

     $ ls

external.C
external.exp 
external.sql 
proc_collect.sql
tab_dirs.sql
tab_tasks.sql 
external.def 
external.h 
proc_archive.sql 
tab_config.sql 
tab_errs.sql

 

Overview of the task_details.json file:

The task_details.json file defines each data collection task, and is used by the setup.py script to deploy each data collection task into the Db2 Admin Task Scheduler.

The default task_details.json file that is included with the scripts looks like so:

(db2inst1@hotellnx112) /home/db2inst1/db2histmon-master/1_setup/
  $ less task_details.json

 
[
    {
      "collection_name": "ENV_GET_SYSTEM_RESOURCES",
      "collection_class": "SQL",
      "collection_command": "SELECT CURRENT TIMESTAMP AS COLLECTION_TIME, T.* FROM TABLE(SYSPROC.ENV_GET_SYSTEM_RESOURCES()) T",FROM TABLE(SYSPROC.ENV_GET_SYSTEM_RESOURCES()) T",
      "collection_freq": "* * * * *",
      "collection_level": "1",
      "collection_condition": "",
      "loader_join_columns": "",
      "loader_diff_exempt_columns": "ALL",
      "quickparse_summary_columns": "ALL"
    },
 
    ...etc...
         ...etc...
 
    {
      "collection_name": "VMSTAT",
      "collection_class": "SYS",
      "collection_command": "vmstat",
      "collection_freq": "* * * * *",
      "collection_level": "1",
      "collection_condition": "UNIX",
      "loader_join_columns": "",
      "loader_diff_exempt_columns": "",
      "quickparse_summary_columns": ""
    }
]

    • The "collection_name" defines the name of the collection task (this name will be used later to generate the filename into which this data collection will be output, so avoid using spaces or special characters).
    • The "collection_class"can be either "SYS" or "SQL".  If this task is defined as "SYS" then the collection_command is invoked as a system call to the operating system. If this task is defined as "SQL", then the collection_command is invoked as an sql query against the database.
    • The "collection_command"is the operating system or sql query which generates the desired monitoring or diagnostic information.
    • The "collection_freq" defines the frequency of data collection, and this is specified in cron format. The maximum frequency is every 1 minute. 
    • The "collection_level"can be either "1" or "2".  When the setup.py script is executed to deploy these data collection tasks to the database, the administrator has the option of deploying only a small set of basic data collection (which are those tasks defined with a collection_level value of 1), or an expanded set of data collection (which are those tasks defined with a collection_level value of 2).
    • The "collection_condition"is used to define the environmental condition under which this data collection task should run. As of the time of this writing, the possible values are "HADR", "PURESCALE", "UNIX", "WINDOWS", or nothing. For example, if a task is defined with a collection_condition value of "HADR", then the task is only executed if an db2 HADR environment is detected. If the collection_condition is nothing, then the task is always executed.
    • The "loader_join_columns""loader_diff_exempt_columns", and "quickparse_summary_columns"will be the focus of a subsequent blog.

setup.py usage and arguments:

Use the setup.py --help option to display usage:


    • The [database] name parameter is mandatory and specifies the local database against which the historical monitoring setup will be deployed.  If you have multiple databases in your database instance, and wish to deploy historical monitoring on these other databases, you must perform the setup.py script on each one separately.
    • The--username and --password arguments are used to establish the database connection. These are required if your database is configured to disallow local user IPC connections without a password. (If unspecified, a connection to the database using the current user is used).
    • The --cleanup option will perform a full removal of all previously created IBMHIST control tables, procedures, and external UDFs, and data collection tasks. Any previous data collection files or archived sub-folders will not be removed.
    • The --update_config_onlyoption will perform only the creation (or re-creation) of the IBMHIST control tables, procedures, and external UDFs, and configure (or re-configure) any options/arguments that are specified. Note that default values are used for any options/arguments which are not specified, even if these arguments were specified during a previous invocation of the script.
    • The --update_tasks_onlyoption will perform only the removal of any previously deployed data collection tasks, and will parse of the task_details.json file and (re)deploy all data collections tasks.
      (Note, when neither the --update_config_only or --update_tasks_only options are specified, then both operations are always performed).
    • The --bldrtn_pathoptional argument specifies the fully qualified path to the sqllib/samples/cpp/bldrtn This is required to compile the external C++ UDFs which perform the collection of operating system 'SYS' data collection tasks. (If unspecified, the default path /home/<current_user>/sqllib/samples/cpp/bldrtn is used).
    • The --coll_pathoptional argument specifies the fully qualified path where a data collection sub-folder will be created (every hour) and the monitoring/diagnostic data will be placed into. (If unspecified, the default path /home/<current_user>/sqllib/db2dump/ is used).
    • The --arch_path optional argument specifies the fully qualified path to the folder where the data collection folder (every hour) will be moved after it is removed from the active coll_path. (If unspecified, the default path /home/<current_user>/sqllib/db2dump/is used).
    • The --max_sizeoptional argument specifies the maximum size (in bytes) of archival data to retain within the arch_path. When this limit is breached, hourly data collection folders (starting with the oldest) will be deleted. (If unspecified, the default value 1073741824 bytes (1GB) is used).
    • The --arch_cmdoptional argument specifies the operating system command that is used (every hour) to tar/zip/compress the active data collection folder into the archive folder. A filename (in the format '<dbname>_<yyyy><mm><dd><hh>') is automatically generated and should not be specified. Tokens '_src_' and '_dest_' are used as placeholders in place of the coll_path and arch_path  (If unspecified, the default command 'tar -caf _dest_ _src_' is assumed). For example, if one wanted to use the star unix command, instead of the default tar command, one might specify 'star -c -f=_dest_ _src_'.
    • The --arch_extoptional argument specifies the extension to tack onto the archived folder's file after the arch_cmd is completed. (If unspecified, the default extension ".tar.gz" is used).
    • The --coll_leveloptional argument specifies the level of data collections. If "1" is specified, only the data collection tasks in the task_details.json file with a "collection_level" value of 1 are deployed.  If "2" is specified, any data collection tasks with a "collection_level" value of 1 or 2 are deployed.

Example of running the setup.py script:

Here, we execute the setup.py script against database TESTDB1, using all default arguments:
<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/
$ python3 setup.py testdb1

 First, we see the creation of the SYSTOOLSPACE table space (if not already existing), and the determination of environment conditions (UNIX/WINDOWS, HADR, and/or PURESCALE):
 
Connecting to database: TESTDB1
Creating tablespace SYSTOOLSPACE ...
Determining environment ...
    Determining if Windows or Unix ...
        Unix detected ...
    Determining if pureScale ...
        pureScale not detected ...
    Determining if HADR ...
        HADR not detected ...
 
Next we see the unscheduling of any previously scheduled tasks:

Unscheduling tasks ...

Next we see the creation (or recreation) of the IBMHIST schema, the creation of the IBMHIST control tables, procedures, and compilation of the external UDFs.

Setting up IBMHIST schema and its objects ...
    Creating IBMHIST schema ...
    Setting schema to IBMHIST ...
    Registering external functions ...
        Copying bldrtn script from '/home/db2inst1/sqllib/samples/cpp/bldrtn' to sql/ ...
        Building external functions ...
        Executing external.sql ...
    Registering tables ...
        Executing tab_config.sql ...
        Executing tab_errs.sql ...
        Executing tab_tasks.sql ...
        Executing tab_dirs.sql ...
    Registering procedures ...
        Executing proc_collect.sql ...
        Executing proc_archive.sql ...
    Removing additional files created during setup ...


Next we see the configuration of settings based on setup.py input arguments (or default values as in this example):
 
Configuring IBMHIST settings ...
    Deleting configurations from IBMHIST.TAB_CONFIG ...
    Setting COLL_PATH to '/home/db2inst1/sqllib/db2dump' ...
    Setting ARCH_PATH to '/home/db2inst1/sqllib/db2dump' ...
    Setting MAX_SIZE to '1073741824' ...
    Setting ARCH_CMD to 'tar -caf _dest_ _src_' and ARCH_EXT to '.tar.gz' ...
    Setting TASK_DETAILS_PATH to '/home/hotellnx112/db2inst1/db2histmon-master/1_setup/task_details.json' ...

 Next we see some validation tests and sanity checks:
 
Testing IBMHIST functionality ...
    Testing IBMHIST.PROC_COLLECT with a dummy SQL task ...
    Testing IBMHIST.PROC_COLLECT with a dummy SYS task ...
    Testing IBMHIST.PROC_ARCHIVE to archive collection directories ...
    Testing IBMHIST.PROC_ARCHIVE to delete archival directories once max size is reached ...

Next we see the parsing of the task_details.json file, and the registration of data collection tasks to the Db2 Admin Task Scheduler:
 
Scheduling tasks ...
    Setting Admin Task Scheduler registry variable ...
    Reading from task_details.json file ...
    Scheduling task: ENV_GET_SYSTEM_RESOURCES
    Scheduling task: MON_GET_AGENT
    Scheduling task: MON_GET_CONNECTION
    Scheduling task: MON_GET_ACTIVITY
    Scheduling task: MON_GET_UNIT_OF_WORK
    Scheduling task: MON_CURRENT_SQL
    Scheduling task: MON_GET_APPL_LOCKWAIT
    Scheduling task: MON_GET_EXTENDED_LATCH_WAIT
    Scheduling task: MON_GET_MEMORY_POOL
    Scheduling task: MON_GET_MEMORY_SET
    Scheduling task: MON_GET_TRANSACTION_LOG
    Scheduling task: DB_GET_CFG
    Scheduling task: DBMCFG
    Scheduling task: ENV_INST_INFO
    Scheduling task: ENV_GET_REG_VARIABLES
    Scheduling task: MON_GET_EXTENT_MOVEMENT_STATUS
    Scheduling task: MON_GET_GROUP_BUFFERPOOL
    Scheduling task: MON_GET_INDEX
    Scheduling task: MON_GET_LATCH
    Scheduling task: MON_GET_BUFFERPOOL
    Scheduling task: MON_GET_PAGE_ACCESS_INFO
    Scheduling task: MON_GET_PKG_CACHE_STMT
    Scheduling task: MON_GET_REBALANCE_STATUS
    Scheduling task: MON_GET_SERVERLIST
    Scheduling task: MON_GET_TABLE
    Scheduling task: MON_GET_TABLESPACE
    Scheduling task: MON_GET_UTILITY
    Scheduling task: MON_GET_WORKLOAD
    Scheduling task: VMSTAT
    Scheduling task: IOSTAT
    Scheduling task: NETSTAT
    Scheduling task: ARCHIVE
    Collection will begin in approximately 5 minutes ... 

Finally, the setup script is complete. Within about 5mins the Db2 Admin Task Scheduler should begin executing the data collection tasks, at their defined frequency.
 
Closing connection ...
Done


The Data Collection and Data Archive folders:

Approximately 5mins after executing the setup script, the Db2 Admin Task Scheduler should begin executing the data collection tasks (while the database remains activated) at their defined collection frequency.

Because I did not explicitly specify a --coll_path for the data collection, the default path within the database's DIAGPATH (usually ~/sqllib/db2dump/) is used, and a folder "IBMHIST_<dbname>" is created. Likewise for the --arch_path, a folder named "IBMHIST_<dbname>_archive" is created:

<db2inst1@hotellnx112> /home/db2inst1/sqllib/db2dump/
$ ls
 
db2ats_sps_bnd.log  
db2diag.log   
db2inst1.nfy  
IBMHIST_TESTDB1/       
IBMHIST_TESTDB1_archive/
stmmlog

 

Within the data collection folder, we can see that an hourly data collection sub-folder has been created:

 <db2inst1@hotellnx112> /home/db2inst1/sqllib/db2dump/IBMHIST_TESTDB1/
$ ls


TESTDB1_2020062714/

And within this hourly data collection sub-folder, we see the output of each of the data collection tasks, at their defined frequencies.  In this example, I listed the directory contents approximately 9mins after the Db2 Admin Task Scheduler began executing the tasks:

<db2inst1@hotellnx112> /home/db2inst1/sqllib/db2dump/IBMHIST_TESTDB1/TESTDB1_2020062714/
$ ls

ENV_GET_SYSTEM_RESOURCES_202006271421.del                     MON_GET_MEMORY_SET_202006271421.del
ENV_GET_SYSTEM_RESOURCES_202006271422.del                     MON_GET_MEMORY_SET_202006271422.del
ENV_GET_SYSTEM_RESOURCES_202006271423.del                     MON_GET_MEMORY_SET_202006271423.del
ENV_GET_SYSTEM_RESOURCES_202006271424.del                     MON_GET_MEMORY_SET_202006271424.del
ENV_GET_SYSTEM_RESOURCES_202006271425.del                     MON_GET_MEMORY_SET_202006271425.del
ENV_GET_SYSTEM_RESOURCES_202006271426.del                     MON_GET_MEMORY_SET_202006271426.del
ENV_GET_SYSTEM_RESOURCES_202006271427.del                     MON_GET_MEMORY_SET_202006271427.del
ENV_GET_SYSTEM_RESOURCES_202006271428.del                     MON_GET_MEMORY_SET_202006271428.del
ENV_GET_SYSTEM_RESOURCES_202006271429.del                     MON_GET_MEMORY_SET_202006271429.del
IOSTAT_202006271421.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271421.del
IOSTAT_202006271422.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271422.del
IOSTAT_202006271423.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271423.del
IOSTAT_202006271424.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271424.del
IOSTAT_202006271425.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271425.del
IOSTAT_202006271426.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271426.del
IOSTAT_202006271427.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271427.del
IOSTAT_202006271428.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271428.del
IOSTAT_202006271429.del                                                                 MON_GET_PAGE_ACCESS_INFO_202006271429.del
MON_CURRENT_SQL_202006271421.del                                       MON_GET_REBALANCE_STATUS_202006271421.del
MON_CURRENT_SQL_202006271424.del                                       MON_GET_REBALANCE_STATUS_202006271422.del
MON_CURRENT_SQL_202006271427.del                                       MON_GET_REBALANCE_STATUS_202006271423.del
MON_GET_ACTIVITY_202006271422.del                                        MON_GET_REBALANCE_STATUS_202006271424.del
MON_GET_ACTIVITY_202006271424.del                                        MON_GET_REBALANCE_STATUS_202006271425.del
MON_GET_ACTIVITY_202006271426.del                                        MON_GET_REBALANCE_STATUS_202006271426.del
MON_GET_ACTIVITY_202006271428.del                                        MON_GET_REBALANCE_STATUS_202006271427.del
MON_GET_AGENT_202006271422.del                                           MON_GET_REBALANCE_STATUS_202006271428.del
MON_GET_AGENT_202006271424.del                                           MON_GET_REBALANCE_STATUS_202006271429.del
MON_GET_AGENT_202006271426.del                                           MON_GET_SERVERLIST_202006271421.del
MON_GET_AGENT_202006271428.del                                           MON_GET_SERVERLIST_202006271422.del
MON_GET_APPL_LOCKWAIT_202006271421.del                          MON_GET_SERVERLIST_202006271423.del
MON_GET_APPL_LOCKWAIT_202006271422.del                          MON_GET_SERVERLIST_202006271424.del
MON_GET_APPL_LOCKWAIT_202006271423.del                          MON_GET_SERVERLIST_202006271425.del
MON_GET_APPL_LOCKWAIT_202006271424.del                          MON_GET_SERVERLIST_202006271426.del
MON_GET_APPL_LOCKWAIT_202006271425.del                          MON_GET_SERVERLIST_202006271427.del
MON_GET_APPL_LOCKWAIT_202006271426.del                          MON_GET_SERVERLIST_202006271428.del
MON_GET_APPL_LOCKWAIT_202006271427.del                          MON_GET_SERVERLIST_202006271429.del
MON_GET_APPL_LOCKWAIT_202006271428.del                          MON_GET_TRANSACTION_LOG_202006271421.del
MON_GET_APPL_LOCKWAIT_202006271429.del                          MON_GET_TRANSACTION_LOG_202006271422.del
MON_GET_CONNECTION_202006271421.del                               MON_GET_TRANSACTION_LOG_202006271423.del
MON_GET_CONNECTION_202006271424.del                               MON_GET_TRANSACTION_LOG_202006271424.del
MON_GET_CONNECTION_202006271427.del                               MON_GET_TRANSACTION_LOG_202006271425.del
MON_GET_EXTENDED_LATCH_WAIT_202006271421.del           MON_GET_TRANSACTION_LOG_202006271426.del
MON_GET_EXTENDED_LATCH_WAIT_202006271422.del           MON_GET_TRANSACTION_LOG_202006271427.del
MON_GET_EXTENDED_LATCH_WAIT_202006271423.del           MON_GET_TRANSACTION_LOG_202006271428.del
MON_GET_EXTENDED_LATCH_WAIT_202006271424.del           MON_GET_TRANSACTION_LOG_202006271429.del
MON_GET_EXTENDED_LATCH_WAIT_202006271425.del           MON_GET_UNIT_OF_WORK_202006271422.del
MON_GET_EXTENDED_LATCH_WAIT_202006271426.del           MON_GET_UNIT_OF_WORK_202006271424.del
MON_GET_EXTENDED_LATCH_WAIT_202006271427.del           MON_GET_UNIT_OF_WORK_202006271426.del
MON_GET_EXTENDED_LATCH_WAIT_202006271428.del           MON_GET_UNIT_OF_WORK_202006271428.del
MON_GET_EXTENDED_LATCH_WAIT_202006271429.del           MON_GET_UTILITY_202006271421.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271421.del  MON_GET_UTILITY_202006271422.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271422.del  MON_GET_UTILITY_202006271423.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271423.del  MON_GET_UTILITY_202006271424.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271424.del  MON_GET_UTILITY_202006271425.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271425.del  MON_GET_UTILITY_202006271426.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271426.del  MON_GET_UTILITY_202006271427.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271427.del  MON_GET_UTILITY_202006271428.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271428.del  MON_GET_UTILITY_202006271429.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006271429.del  MON_GET_WORKLOAD_202006271425.del
MON_GET_INDEX_202006271425.del                                            NETSTAT_202006271421.del
MON_GET_LATCH_202006271421.del                                            NETSTAT_202006271422.del
MON_GET_LATCH_202006271422.del                                            NETSTAT_202006271423.del
MON_GET_LATCH_202006271423.del                                            NETSTAT_202006271424.del
MON_GET_LATCH_202006271424.del                                            NETSTAT_202006271425.del
MON_GET_LATCH_202006271425.del                                            NETSTAT_202006271426.del
MON_GET_LATCH_202006271426.del                                            NETSTAT_202006271427.del
MON_GET_LATCH_202006271427.del                                            NETSTAT_202006271428.del
MON_GET_LATCH_202006271428.del                                            NETSTAT_202006271429.del
MON_GET_LATCH_202006271429.del                                            task_details_copy.json
MON_GET_MEMORY_POOL_202006271421.del                          VMSTAT_202006271421.del
MON_GET_MEMORY_POOL_202006271422.del                          VMSTAT_202006271422.del
MON_GET_MEMORY_POOL_202006271423.del                          VMSTAT_202006271423.del
MON_GET_MEMORY_POOL_202006271424.del                          VMSTAT_202006271424.del
MON_GET_MEMORY_POOL_202006271425.del                          VMSTAT_202006271425.del
MON_GET_MEMORY_POOL_202006271426.del                          VMSTAT_202006271426.del
MON_GET_MEMORY_POOL_202006271427.del                          VMSTAT_202006271427.del
MON_GET_MEMORY_POOL_202006271428.del                          VMSTAT_202006271428.del
MON_GET_MEMORY_POOL_202006271429.del                          VMSTAT_202006271429.del

After a few hours, I can see that each hours data collection folders has been tar+gzipped and moved to the archive folder:

(db2inst1@hotellnx112) /home/db2inst1/sqllib/db2dump/IBMHIST_TESTDB1_archive
$ ls
 
TESTDB1_2020062716.tar.gz
TESTDB1_2020062717.tar.gz
TESTDB1_2020062718.tar.gz
TESTDB1_2020062719.tar.gz


The IBMHIST Control Tables


As mentioned earlier, the historical monitoring framework uses a set of control tables in order to operate. We can see the tables here:
(db2inst1@hotellnx112) /home/db2inst1/db2histmon-master/1_setup/
$ db2 connect to testdb1
$ db2 list tables for schema IBMHIST
 
Table/View                      Schema    Type  Creation time
------------------------------- --------------- ----- --------------------------
TAB_CONFIG                   IBMHIST         T     2020-06-27-14.12.07.943550
TAB_DIRS                         IBMHIST         T     2020-06-27-14.12.08.907709
TAB_ERRS                         IBMHIST         T     2020-06-27-14.12.08.267541
TAB_TASKS                      IBMHIST         T     2020-06-27-14.12.08.521290

 

  • The IBMHIST.TAB_CONFIG table contains information about the configuration parameters/arguments that were passed into the setup.py script (or the default value, if they were not specified).
        
 
  • The IBMHIST.TAB_DIRS table contains information about the current (and any previously specified) data collection paths (coll_path), and a history of each data archive path (arch_path), including the size of the archived file.
          For example, we can see the current hourly data collection path as so:
 
       
 
        Or we can see a list of all the previously archived hourly archived folders as so:
 
      
 
 
  • The IBMHIST.TAB_TASKS table contains information about each data collection task, as was defined in the task_detail.json file when the full setup.py was last performed (or refreshed).
          For example, we can list each active data collection task including it's operating system command or sql query:
 
        
 
                For SQL class tasks, we can also list all the column names of the generated data by examining the HEADER column:
 
            
 
 
  • The IBMHIST.TAB_ERRS will contain a history of any execution errors that occurred by the historical monitoring framework:
      
   
Example of modifying a setup parameter (changing the data collection path):
 
First we create a new folder to host the active data collection:
 
(db2inst1@hotellnx112) /home/db2inst1/
$ mkdir /home/db2inst1/MYHISTMON_COLLECTION_TESTDB1

 
Since the data collections are performed by external UDFs under a fenced userid (not the db2-instance-owner userid), our new folder must be accessible to the fenced userid, for simplicity I'm giving the folder permission to everyone:
 
(db2inst1@hotellnx112) /home/db2inst1/
$ chmod 777 MYHISTMON_COLLECTION_TESTDB1

 
Next I execute the setup script, specifying the --coll_path option with our new path. I also specify the --update_config_only option, since I did not modify the task_details.json file and  do not need to update all the data collection tasks.
 
(db2inst1@hotellnx112) /home/dbb2inst1/db2histmon-master/1_setup/
$ python3 setup.py testdb1 --coll_path /home/db2inst1MYHISTMON_COLLECTION_TESTDB1/ --update_config_only
 
Connecting to database: TESTDB1
Configuring IBMHIST settings ...
    Deleting configurations from IBMHIST.TAB_CONFIG ...
    Setting COLL_PATH to '/home/db2inst1/MYHISTMON_COLLECTION_TESTDB1' ...
    Setting ARCH_PATH to '/home/db2inst1/sqllib/db2dump' ...
    Setting MAX_SIZE to '1073741824' ...
    Setting ARCH_CMD to 'tar -caf _dest_ _src_' and ARCH_EXT to '.tar.gz' ...
    Setting TASK_DETAILS_PATH to '/home/hotellnx112/db2inst1/db2histmon-master/1_setup/task_details.json' ...
Closing connection ...
Done
 
Now all subsequent data collection tasks will be put into the new data collection path.
(Note that existing data in the old collection path will be automatically moved into the new path).
(db2inst1@hotellnx112) /home/db2inst1/MYHISTMON_COLLECTION_TESTDB1/IBMHIST_TESTDB1/TESTDB1_2020062913
$ ls
 
IOSTAT_202006291304.del
IOSTAT_202006291305.del
IOSTAT_202006291306.del
IOSTAT_202006291307.del
IOSTAT_202006291308.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006291304.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006291305.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006291306.del
MON_GET_EXTENT_MOVEMENT_STATUS_202006291307.del
...etc...
...etc...

Example of Adding a new data collection task (a new operating system metric):

In this example, I want to add a new task to collect "ps -elf" output from the operating system, every minute.

 

First, we edit our task_details.json file to add a new entry to the bottom for our new task called 'PS_ELF' as so:

(note, I can probably just leave the collection_condition field blank, since I do not intend to run this task_details.json file in a Windows environment, and thus always executing this data collection task would be fine, but for completeness I'll add UNIX as a collection_condition, since 'ps -elf' makes no sense in Windows).

 

<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/
$ vim task_details.json
 
    ...etc...
    ...etc...
    ,
    {
        "collection_name": "PS_ELF",
        "collection_class": "SYS",
        "collection_command": "ps -elf",
        "collection_freq": "* * * * *",
        "collection_level": "1",
        "collection_condition": "UNIX",
        "loader_join_columns": "",
        "loader_diff_exempt_columns": "",
        "quickparse_summary_columns": ""
    }

Next, I executed the setup.py script.  Since I'm only adding a new task, I'll just use the --update_tasks_only option.  We can see our new PS_ELF task included:

<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/
$ python3 setup.py testdb1 --update_tasks_only
 
Connecting to database: TESTDB1
Determining environment ...
    Determining if Windows or Unix ...
        Unix detected ...
    Determining if pureScale ...
        pureScale not detected ...
    Determining if HADR ...
        HADR not detected ...
Unscheduling tasks ...
Scheduling tasks ...
    Setting Admin Task Scheduler registry variable ...
    Reading from task_details.json file ...
    Scheduling task: ENV_GET_SYSTEM_RESOURCES
    Scheduling task: MON_GET_AGENT
    Scheduling task: MON_GET_CONNECTION
    Scheduling task: MON_GET_ACTIVITY
    Scheduling task: MON_GET_UNIT_OF_WORK
    Scheduling task: MON_CURRENT_SQL
    Scheduling task: MON_GET_APPL_LOCKWAIT
    Scheduling task: MON_GET_EXTENDED_LATCH_WAIT
    Scheduling task: MON_GET_MEMORY_POOL
    Scheduling task: MON_GET_MEMORY_SET
    Scheduling task: MON_GET_TRANSACTION_LOG
    Scheduling task: DB_GET_CFG
    Scheduling task: DBMCFG
    Scheduling task: ENV_INST_INFO
    Scheduling task: ENV_GET_REG_VARIABLES
    Scheduling task: MON_GET_EXTENT_MOVEMENT_STATUS
    Scheduling task: MON_GET_GROUP_BUFFERPOOL
    Scheduling task: MON_GET_INDEX
    Scheduling task: MON_GET_LATCH
    Scheduling task: MON_GET_BUFFERPOOL
    Scheduling task: MON_GET_PAGE_ACCESS_INFO
    Scheduling task: MON_GET_PKG_CACHE_STMT
    Scheduling task: MON_GET_REBALANCE_STATUS
    Scheduling task: MON_GET_SERVERLIST
    Scheduling task: MON_GET_TABLE
    Scheduling task: MON_GET_TABLESPACE
    Scheduling task: MON_GET_UTILITY
    Scheduling task: MON_GET_WORKLOAD
    Scheduling task: VMSTAT
    Scheduling task: IOSTAT
    Scheduling task: NETSTAT
    Scheduling task: PS_ELF
    Scheduling task: ARCHIVE
    Collection will begin in approximately 5 minutes ... 
Closing connection ...

Done

If we examine the IBMHIST.TAB_TASKS table, we can also see the new PS_ELF data collection task:

 <db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/

$ db2 "select COLL_NAME from IBMHIST.TAB_TASKS where COLL_NAME='PS_ELF'"
 
COLL_NAME                                                            
----------------------------------------------------------------
PS_ELF                                                          
 
  1 record(s) selected.


If we examine the  SYSTOOLS.ADMIN_TASK_LIST table, we can see that our PS_ELF task has been added to the admin task scheduler:

Lastly, if we wait approximately 5mins for the Db2 Admin Task Scheduler to process the new task, we can then begin seeing the output of our new PS_ELF task within the hourly data collection folder:

<db2inst1@hotellnx112> /home/db2inst1/sqllib/db2dump/IBMHIST_TESTDB1/TESTDB1_2020062717/
$ ls -l
 
DB_GET_CFG_202006271700.del
DBMCFG_202006271700.del
...etc... 
...etc... 
PS_ELF_202006271700.del
...etc...
...etc... 

Example of Adding a new data collection task  (a new sql query):

In this example, I want to add a new task to collect MON_GET_FCM() output every 10mins.
First, I'll edit the task_details.json file to add a new entry to the bottom for our new task called 'MON_GET_FCM', as so:
<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/
$ vim task_details.json
 
    ...etc...
    ...etc...
    ,
    {
        "collection_name": "MON_GET_FCM",
        "collection_class": "SQL",
        "collection_command": "SELECT CURRENT TIMESTAMP AS COLLECTION_TIME, T.* FROM TABLE (MON_GET_FCM(-2)) T",
        "collection_freq": "10 * * * *",
        "collection_level": "1",
        "collection_condition": "",
        "loader_join_columns": "",
        "loader_diff_exempt_columns": "",
        "quickparse_summary_columns": ""
    }

Next, I'll invoke the setup.py script. Here I use the --update_tasks_only option.  We can see the new MON_GET_FCM task added:

<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup.
$ python3 setup.py testdb1 --update_tasks_only
 
Connecting to database: TESTDB1
Determining environment ...
    Determining if Windows or Unix ...
        Unix detected ...
    Determining if pureScale ...
        pureScale not detected ...
    Determining if HADR ...
        HADR not detected ...
Unscheduling tasks ...
Scheduling tasks ...
    Setting Admin Task Scheduler registry variable ...
    Reading from task_details.json file ...
    Scheduling task: ENV_GET_SYSTEM_RESOURCES
    Scheduling task: MON_GET_AGENT
    Scheduling task: MON_GET_CONNECTION
    Scheduling task: MON_GET_ACTIVITY
    Scheduling task: MON_GET_UNIT_OF_WORK
    Scheduling task: MON_CURRENT_SQL
    Scheduling task: MON_GET_APPL_LOCKWAIT
    Scheduling task: MON_GET_EXTENDED_LATCH_WAIT
    Scheduling task: MON_GET_MEMORY_POOL
    Scheduling task: MON_GET_MEMORY_SET
    Scheduling task: MON_GET_TRANSACTION_LOG
    Scheduling task: DB_GET_CFG
    Scheduling task: DBMCFG
    Scheduling task: ENV_INST_INFO
    Scheduling task: ENV_GET_REG_VARIABLES
    Scheduling task: MON_GET_EXTENT_MOVEMENT_STATUS
    Scheduling task: MON_GET_GROUP_BUFFERPOOL
    Scheduling task: MON_GET_INDEX
    Scheduling task: MON_GET_LATCH
    Scheduling task: MON_GET_BUFFERPOOL
    Scheduling task: MON_GET_PAGE_ACCESS_INFO
    Scheduling task: MON_GET_PKG_CACHE_STMT
    Scheduling task: MON_GET_REBALANCE_STATUS
    Scheduling task: MON_GET_SERVERLIST
    Scheduling task: MON_GET_TABLE
    Scheduling task: MON_GET_TABLESPACE
    Scheduling task: MON_GET_UTILITY
    Scheduling task: MON_GET_WORKLOAD
    Scheduling task: VMSTAT
    Scheduling task: IOSTAT
    Scheduling task: NETSTAT
    Scheduling task: PS_ELF
    Scheduling task: MON_GET_FCM
    Scheduling task: ARCHIVE
    Collection will begin in approximately 5 minutes ...
Closing connection ...
Done

We can see our new task has been added to the IBMHIST.TAB_TASKS table:

<db2inst1@hotellnx112> /home/db2inst1/
$ db2 "select COLL_NAME from IBMHIST.TAB_TASKS where COLL_NAME='MON_GET_FCM'"
 
COLL_NAME
----------------------------------------------------------------
MON_GET_FCM
 
  1 record(s) selected.

Lastly, if we wait approximately 5mins for the Db2 Admin Task Scheduler to process the new task, we can then begin seeing the output of our new MON_GET_FCM task within the hourly data collection folder:

<db2inst1@hotellnx112> /home/db2inst1/sqllib/db2dump/IBMHIST_TESTDB1/TESTDB1_2020062914/
$ ls -l
 
DB_GET_CFG_202006291400.del 
DBMCFG_202006291400.del
...etc...  
...etc...  
MON_GET_FCM_202006291439.del
...etc...  
...etc...  

Example of Removing an data collection task:

The cleanest way to remove a data collection task is to edit the task_details.json file, remove the desired task entry, and then run the setup.py script again, using the --update_tasks_only option.
For example, if I wanted to remove the iostat data collection task, I would remove this entry from the task_details.json file:

<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/
$ vim task_details.json
 
    ...etc...
    ...etc...
    {
        "collection_name": "IOSTAT",
        "collection_class": "SYS",
        "collection_command": "iostat",
        "collection_freq": "* * * * *",
        "collection_level": "1",
        "collection_condition": "UNIX",
        "loader_join_columns": "",
        "loader_diff_exempt_columns": "",
        "quickparse_summary_columns": ""
    },
    ...etc...
    ...etc...

And the re-run the setup.py script using the --update_tasks_only option: 
<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/
$ python3 setup.py testdb1 --update_tasks_only
 
Connecting to database: TESTDB1
Determining environment ...
    Determining if Windows or Unix ...
        Unix detected ...
    Determining if pureScale ...
        pureScale not detected ...
    Determining if HADR ...
        HADR not detected ...
Unscheduling tasks ...
Scheduling tasks ...
    Setting Admin Task Scheduler registry variable ...
    Reading from task_details.json file ...
    Scheduling task: ENV_GET_SYSTEM_RESOURCES
    Scheduling task: MON_GET_AGENT
    Scheduling task: MON_GET_CONNECTION
    Scheduling task: MON_GET_ACTIVITY
    Scheduling task: MON_GET_UNIT_OF_WORK
    Scheduling task: MON_CURRENT_SQL
    Scheduling task: MON_GET_APPL_LOCKWAIT
    Scheduling task: MON_GET_EXTENDED_LATCH_WAIT
    Scheduling task: MON_GET_MEMORY_POOL
    Scheduling task: MON_GET_MEMORY_SET
    Scheduling task: MON_GET_TRANSACTION_LOG
    Scheduling task: DB_GET_CFG
    Scheduling task: DBMCFG
    Scheduling task: ENV_INST_INFO
    Scheduling task: ENV_GET_REG_VARIABLES
    Scheduling task: MON_GET_EXTENT_MOVEMENT_STATUS
    Scheduling task: MON_GET_GROUP_BUFFERPOOL
    Scheduling task: MON_GET_INDEX
    Scheduling task: MON_GET_LATCH
    Scheduling task: MON_GET_BUFFERPOOL
    Scheduling task: MON_GET_PAGE_ACCESS_INFO
    Scheduling task: MON_GET_PKG_CACHE_STMT
    Scheduling task: MON_GET_REBALANCE_STATUS
    Scheduling task: MON_GET_SERVERLIST
    Scheduling task: MON_GET_TABLE
    Scheduling task: MON_GET_TABLESPACE
    Scheduling task: MON_GET_UTILITY
    Scheduling task: MON_GET_WORKLOAD
    Scheduling task: VMSTAT
    Scheduling task: NETSTAT
    Scheduling task: PS_ELF
    Scheduling task: MON_GET_FCM
    Scheduling task: ARCHIVE
    Collection will begin in approximately 5 minutes ...
Closing connection ...
Done

Example of Stopping all data collection tasks:

The easiest way to stop all data collection tasks is to simply disable and remove the historical monitoring framework from the database, using the setup.py --cleanup argument:

<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup/
$ python3 setup.py testdb1 --cleanup
 
Connecting to database: TESTDB1
Unscheduling tasks ...
Dropping IBMHIST schema and its objects ...
Closing connection ...
Done

You can validate that no data collection tasks are scheduled within the Admin Task Scheduler by:
<db2inst1@hotellnx112> /home/db2inst1/db2histmon-master/1_setup.
$ db2 "SELECT NAME, PROCEDURE_NAME from SYSTOOLS.ADMIN_TASK_LIST"
 
NAME                    PROCEDURE_NAME                                                                           
----------------------- --------------------------------------------------------
  
 0 record(s) selected.

Generating Reports from Db2 historical monitoring (db2histmon) data


The report generation script can be used to scrape previously generated db2histmon data collections and produce summary reports in multiple flavours, similar to the reports produced by the MONREPORT module. The reports are useful for analyzing various metrics, and highlighting statistical outliers, from previous time periods.

The report.py script itself is located in the 4_report/ sub-folder:

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ ls
 
1_setup/
2_loader/ 
3_quickparser/ 
4_report/

Pre-requisites:

  • The scripts can be deployed in any Unix or Windows environment where Db2 is supported.
  • Python2 or higher.
  • The python pandas, argparse, and numpy modules are also required to execute the report.py script.

Using pip to install the pandas, argparse, and numpy modules:

(db2inst1@myhost1) /home/db2inst1/
  $ pip install argparse
  Collecting argparse
    Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
    Installing collected packages: argparse
  Successfully installed argparse-1.4.0
(db2inst1@myhost1) /home/db2inst1/
  $ pip install numpy
  Collecting numpy
    Downloading numpy-1.19.4-cp38-cp38-win_amd64.whl (13.0 MB)
    Installing collected packages: numpy
  Successfully installed numpy-1.19.4
(db2inst1@myhost1) /home/db2inst1/
  $ pip install pandas
  Collecting pandas
    Downloading pandas-1.1.4-cp38-cp38-win_amd64.whl (8.9 MB)
    Collecting python-dateutil>=2.7.3
    Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
    Installing collected packages: python-dateutil, pandas
  Successfully installed pandas-1.1.4 python-dateutil-2.8.1

Types of reports:

The report.py script is currently capable of generating three unique types of reports:
DBSUMMARY - Provides a summary of metrics for the entire database, similar to the MONREPORT.DBSUMMARY module.
CONNECTION - Provides a summary of metrics for each connection, similar to the MONREPORT.CONNECTION module.
PKGCACHE - Provides a summary of metrics for the top executed statements that are still cached within the package-cache, similar to the MONREPORT.PKGCACHE module.

 

A report will display various metrics and data points for every data collection time period (or limited only to the time periods specified in the arguments), as described later.


Historical data which is used to generate the report:


The report.py script will scrape historical monitoring data that was previously collected by the Db2 Historical Monitoring (db2histmon) framework. 
For reference, I'm including a screen-shot showing the contents of the db2histmon data collection folder:





Usage and options:


The report.py script has the following usage options:


Use the --report option to specify the type of report to generate, either DBSUMMARY, CONNECTION, or PKGCACHE.
The last argument specifies the folder path where the db2histmon historical monitoring data resides.
I'll explain the other options within the examples further below.


Example 1. Generate a DBSUMMARY report (on all available historical monitoring data)

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
   $ python3 report.py -r DBSUMMARY /path/to/db2histmon/data_files/

        


Example 2. Generate a report on a subset of time-intervals, by using the START_TIME and END_TIME options

 

In some cases, the quantity of db2histmon historical monitoring data collections available or archived is very large and may make the reports too large or unreadable.
We can use the --START_TIME and --END_TIME options to limit the data collection time-intervals that we wish to generate a report on.  

In this example, we wish to include all data collections that were generated after 2020-09-29 02:20 but before 2020-09-29 04:30:


(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ python3 report.py -r DBSUMMARY --start_time "2020-09-29 02:20:00" --end_time "2020-09-29 04:30:00" /path/to/db2histmon/data_files/

        


Example 3. Generate a report with time-intervals condensed into larger time PERIODs.

 

We can also use the --PERIOD option to combine time-intervals from the data collection set into larger sets, such that fewer (but longer duration) time-intervals are displayed in the report.

In this example, many of our original db2histmon historical monitoring data collections are approximately 5minutes apart (as an be seen in the previous report's example above). By specifying the --PERIOD 3 option, we aggregate the data which would normally be reported in three separate time-periods into one time-period: 

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report
$ python3 report.py -r CONNECTION --period 3 /path/to/db2histmon/data_files/

 




Example 4. Generate a report showing MIN, MAX, MEAN statistical values.

Understanding the statistical values associated with a particular metric can be useful when looking for anomalies in the data set.  In this example, I specify the --stats option to include the statistics for each reported value:

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ python3 report.py -r CONNECTION --period 3 /path/to/db2histmon/data_files/ --stats

Stay tuned for new blogs expanding on the Db2 Historical Monitoring (db2histmon) suite, coming soon.

In a future blog I hope to showcase some examples of how the diagnostic data can be analyzed to narrow down a specific problem type. In the mean time, you may find the following article by Ani Kiourktchian
about analyzing the older Persistent Diagnostic Data useful, as many of the data points are the same: https://www.idug.org/communities/community-home/librarydocuments/viewdocument?documentkey=f9abdf15-69d2-4f56-bb01-ec3735da1a9d.

0 comments
423 views

Permalink