Db2 Python Programing, IBM_DB vs IBM_DB_DBI

By John Maenpaa posted Jan 29, 2021 09:00 AM

  

Python Database API

The Python Database API is defined in PEP 249. PEP stands for Python Enhancement Proposal, which is the method used to track modifications to Python through their lifecycle. The Database API is an attempt to create a standard interface definition for database access. Using it brings a level of consistency across DBMS products and should make it easier to port programs or use multiple database engines more easily. Unfortunately, the PEP documents tend to be oriented toward the people implementing the features/specifications rather than the people using them. That means we need to play around a little to see how it really works.

Useful Information Provided by API

The Python Database API requires modules to supply some minimal amount of information that may be needed to use the database.

  • apilevel indicate the API level supported, expect “2.0”

  • threadsafety indicates how thread safe the module may be

  • paramstyle indicates the type of parameter markers to use

We can run a small program to get a look at these and figure out what they contain.

""" Display Database API Global Information"""
import ibm_db_dbi

meaning_threadsafety = {
    0: "Threads may not share the module.",
    1: "Threads may share the module, but not connections.",
    2: "Threads may share the module and connections.",
    3: "Threads may share the module, connections and cursors."
}

meaning_paramstyle = {
    "qmark":	r"Question mark style, e.g. ...WHERE name=?",
    "numeric":	r"Numeric, positional style, e.g. ...WHERE name=:1",
    "named":	r"Named style, e.g. ...WHERE name=:name",
    "format":	r"ANSI C printf format codes, e.g. ...WHERE name=%s",
    "pyformat":	r"Python extended format codes, e.g. ...WHERE name=%(name)s"
}
print(ibm_db_dbi.apilevel)      # expectiong "2.0"
print(ibm_db_dbi.threadsafety,  meaning_threadsafety[ibm_db_dbi.threadsafety])
print(ibm_db_dbi.paramstyle,    meaning_paramstyle[ibm_db_dbi.paramstyle])

Running this, I get the following output:

2.0
0 Threads may not share the module.
qmark Question mark style, e.g. ...WHERE name=?

Now we know that ibm_db_dbi supports the Python Database API v2.0 level defined in PEP 249, the module is not thread safe at all, and that we need to use question marks as parameter markers in our SQL statements. This last is unsurprising and is compatible with the way we have been doing our dynamic SQL for years.

IBM_DB_DBI

If you review the source of the ibm_db package on on Github, you will find an ibm_db_dbi.py module here. This module implements the Python DB API Specification v2.0 for Db2 by acting as a wrapper around the Db2-specific ibm_db API calls.

The ibm_db APIs are themselves very much like the Db2 Call Level Interfaces (CLI) and ODBC APIs. In fact, the ibm_db APIs are a wrapper around the CLI APIs implemented in Python and C.

Connections

Database connections are made using connection objects, not so different from the ibm_db implementation. Connection objects are created (and returned) using the database’s connect() function.

The ibm_db_dbi.connect() function takes one positional parameter (dsn) and five optional parameters:

  • dsn - Connection string

  • user - userid for connection

  • password - password for userid

  • host - hostname for database server

  • database - database name

  • conn_options - options for connection in a dict

As with the Db2-specific ibm_db.connect() function, you can supply the userid and password as part of the connection string. Here, you can supply them as their correct separate parameters. The minimal connection string, if your database is using the default port, would be:

    connect_string = "ATTACH=FALSE;"

But it is probably best to include a few other settings:

    connect_string = "ATTACH=FALSE;"
    connect_string += "PROTOCOL=TCPIP;PORT=" + str(port) + ";"

The connection options, as I included in a prior article, set some useful information that can be used when investigating database issues:

connect_options = { "SQL_ATTR_INFO_PROGRAMNAME": "JHMTESTHELPERS", # 20 char max
                    "SQL_ATTR_INFO_USERID" : getpass.getuser(),    # 255 char max
                    "SQL_ATTR_INFO_WRKSTNNAME" : platform.node()   # 255 char max
                  }

Once we have are connection variables and options all set, we can establish the connection and get our Connection object.

    hdbi = ibm_db_dbi.connect(connect_string,
        host=hostname, database=database,
        user=userid, password=password,
        conn_options=connect_options)

When we are done with a Connection, we simply close it, like so:

    hdbi.close()

Cursors

Rather than dealing with SQL statements, the Python Database API defines Cursor objects as the interface that should be used for executing the SQL (or calling stored procedures). You get a Cursor object using the Connection object’s cursor() method. You then take that cursor, use it to run SQL, get information about the result sets, and fetch the results.

my_sql =  """select distinct tabschema, tabname
               from syscat.tables;
"""

my_cursor = hdbi.cursor()
my_cursor.execute(my_sql)
my_tables = my_cursor.fetchall()

The fetchall() function returned all of the rows from the cursor into a list object with two columns per entry. We can then process the entire in memory list using a quick loop:

for (tabschema,tablename) in my_tables:
    print(tabschema,tablename)

We will come back to the options for fetching after a brief discussion on parameters.

Closing the Cursor object is as simple as using its close() method:

my_cursor.close()

In addition to execute() there are other methods available for executing SQL, including:

  • callproc() to call a stored procedure

  • executemany() to execute a list of statements

The good news is that you can use the same execute() function whether a result set is returned or not.

Parameters

Unlike standard ibm_db and Db2 CLI APIs, we do not have a sequence like:

  1. Prepare

  2. Describe (optional)

  3. Bind Parameters

  4. Execute

Instead, we must do the work with the execute() method. That means our parameterized input for the SQL statement needs to be defined and passed into the execute() function. The easiest way to do that is to use a Python tuple, created by placing your input variables within parentheses.

my_sql = """select distinct tabschema, tabname
             from syscat.tables
            where tabschema = ?
              and type = ?;
"""

my_params = ('DB2INST1', 'T')
my_cursor.execute(my_sql, my_params)

Note, if there is only input variable, you create the tuple using a comma within the parentheses, like so:

my_single_tuple = (input_var1, )

If you forget to do this, you may find yourself banging your head on your desk trying to debug your program. I certainly did when I first missed that comma.

Result Columns

Once you have performed the execute(), there is a description available that provides a list of columns in the result set. Each entry in this list contains the following information items about the column:

  • name

  • type_code

  • display_size

  • internal_size

  • precision

  • scale

  • null_ok

for column_variable in my_cursor.description:
    column_name = column_variable[0]
    print(column_variable)

Fetching

There are a couple of Cursor methods used to return data. If you are certain you have enough memory to return the whole result set, you can simply use fetchall(). More commonly, you would use fetchone() to retrieve each row individually, or fetchmany() to retrieve a set of rows with each call. Using fetchmany() has the advantage of making fewer trips into the database interface and theoretically fewer network calls when the database is on a server.

  • fetchone() returns a single row or None if there are no more rows

  • fetchmany() returns a set of rows

  • fetchall() returns all remaining (unfetched) rows

  • nextset() allows you to retrieve multiple result sets from a stored procedure

Errors

The Python Database API includes specifications for handling errors. It defines specific Exception classes that must be used by the database module with specific inheritance (superclass/subclass) hierarchies. There are both Warning and Error classes. You should wrap your ibm_db_dbi function calls in try/catch blocks.

try:
    my_cursor.execute(my_sql)
except Exception as err:
    print("Error on Execute", err)

Comparing IBM_DB with IBM_DB_DBI

Here are two example programs. The first uses ibm_db and the second uses ibm_db_dbi. Both programs run two queries, one without parameters and one with. The second query prints out the column names at the top of the result rows.

Example using IBM_DB

""" Example program using IBM_DB against Db2"""
import os
import sys
import getpass
import platform
import ibm_db

# --------------------------------------------------
# Database Connection Settings
# --------------------------------------------------
database = "sample"
hostname = "modi"
userid = "db2inst1"
password = "mypasswd"
port = 50000

connect_string = "DRIVER={IBM DB2 ODBC DRIVER}" + ";"
connect_string += "DATABASE=" + database + ";"
connect_string += "HOSTNAME=" + hostname + ";"
connect_string += "PROTOCOL=TCPIP;PORT=" + str(port) + ";"
connect_string += "UID=" + userid + ";"
connect_string += "PWD=" + password + ";"

connect_options = { "SQL_ATTR_INFO_PROGRAMNAME": "JHMTESTHELPERS", # 20 char max
                    "SQL_ATTR_INFO_USERID" : getpass.getuser(),    # 255 char max
                    "SQL_ATTR_INFO_WRKSTNNAME" : platform.node()   # 255 char max
                  }
# --------------------------------------------------
hdbc = None  # Connection Handle
# --------------------------------------------------

try:
    hdbc = ibm_db.connect(connect_string, "", "", connect_options)
except Exception as err:
    print("connection failed with", err)
    sys.exit(1)

if hdbc:
    print("connected")

# --------------------------------------------------
# Query 1
# --------------------------------------------------
print("\nQuery1 begin")

my_sql = """select distinct tabschema, tabname
             from syscat.tables
            where tabschema = 'DB2INST1';
"""

try:
    my_stmt = ibm_db.prepare(hdbc, my_sql)
except Exception as err:
    print("Error on Prepare", err)

try:
    rc = ibm_db.execute(my_stmt)
    if rc:
        try:
            row = ibm_db.fetch_assoc(my_stmt)
            while row:
                print(row["TABSCHEMA"], row["TABNAME"])
                row = ibm_db.fetch_assoc(my_stmt)
        except Exception as err:
            print("Error on Fetch", err)
    else:
        print("Execute failed")

except Exception as err:
    print("Error on Execute", err)

# --------------------------------------------------
# Query 2
# --------------------------------------------------
print("\nQuery2 begin")

my_sql = """select distinct tabschema, tabname
             from syscat.tables
            where tabschema = ?
              and type = ?;
"""

my_params = ("DB2INST1", "T")

try:
    my_stmt = ibm_db.prepare(hdbc, my_sql)
except Exception as err:
    print("Error on Prepare", err)

try:
    rc = ibm_db.execute(my_stmt, my_params)
    if rc:
        column_name1 = ibm_db.field_name(my_stmt, 0)
        column_name2 = ibm_db.field_name(my_stmt, 1)

        try:
            row = ibm_db.fetch_assoc(my_stmt)
            print(column_name1, column_name2)
            while row:
                print(row["TABSCHEMA"], row["TABNAME"])
                row = ibm_db.fetch_assoc(my_stmt)
        except Exception as err:
            print("Error on Fetch", err)
    else:
        print("Execute failed")
except Exception as err:
    print("Error on Execute statement", err)


try:
    ibm_db.free_stmt(my_stmt)
except Exception as err:
    print("Error on free statement", err)

# --------------------------------------------------
# Clean up
# --------------------------------------------------
if hdbc:
    ibm_db.close(hdbc)
    print("disconnected")

print("done")

Example using IBM_DB_DBI

""" Example program using IBM_DB against Db2"""
import os
import sys
import getpass
import platform
import ibm_db_dbi

# --------------------------------------------------
# Database Connection Settings
# --------------------------------------------------
database = "sample"
hostname = "modi"
userid = "db2inst1"
password = "mypasswd"
port = 50000

connect_string = "ATTACH=FALSE;"
# connect_string += "PROTOCOL=TCPIP;PORT=" + str(port) + ";"

connect_options = { "SQL_ATTR_INFO_PROGRAMNAME": "JHMTESTHELPERS", # 20 char max
                    "SQL_ATTR_INFO_USERID" : getpass.getuser(),    # 255 char max
                    "SQL_ATTR_INFO_WRKSTNNAME" : platform.node()   # 255 char max
                  }
# --------------------------------------------------
hdbi = None  # Connection Object
# --------------------------------------------------

try:
    hdbi = ibm_db_dbi.connect(connect_string,
        host=hostname, database=database,
        user=userid, password=password,
        conn_options=connect_options)
except ibm_db_dbi.Warning as warn:
    print("Connection warning:", warn)
except ibm_db_dbi.Error as err:
    print("connection error:", err)
    sys.exit(1)

if hdbi:
    print("connected")

# --------------------------------------------------
# Query 1
# --------------------------------------------------
print("\nQuery1 begin")

my_sql = """select distinct tabschema, tabname
             from syscat.tables
            where tabschema = 'DB2INST1';
"""

my_cursor = hdbi.cursor()

try:
    my_cursor.execute(my_sql)
except Exception as err:
    print("Error on Execute", err)

try:
    my_tables = my_cursor.fetchall()
except Exception as err:
    print("Error on Fetch", err)

for (tabschema,tablename) in my_tables:
    print(tabschema,tablename)

# --------------------------------------------------
# Query 2
# --------------------------------------------------
print("\nQuery2 begin")

my_sql = """select distinct tabschema, tabname
             from syscat.tables
            where tabschema = ?
              and type = ?;
"""

my_params = ("DB2INST1", "T")
my_cursor.execute(my_sql, my_params)

print("Cursor column descriptions")
for column_variable in my_cursor.description:
    print(column_variable)

column_name1 = my_cursor.description[0][0]
column_name2 = my_cursor.description[1][0]

my_tables = my_cursor.fetchall()

if my_tables:
    print("\n")
    print(column_name1, column_name2)
    for (tabschema,tablename) in my_tables:
        print(tabschema,tablename)

if my_cursor:
    my_cursor.close()

# --------------------------------------------------
# Clean up
# --------------------------------------------------
if hdbi:
    if hdbi.close():
        print("disconnected")

print("done")

Conclusion

As you can see from the full examples, using the Python Database API (as implemented in ibm_db_dbi) is a bit cleaner than the CLI-oriented syntax. The exception handling is more along modern expectations and proper classes with methods feels better to use.

Before playing with Python, I had not used the Db2 CLI in many years, mostly because I was doing a lot more Ruby and COBOL programming. Once I began playing with Python, the Db2 interface implemented by ibm_db felt like returning to the old days. The ibm_db_dbi interface documentation was not as clear, so I did not investigate it too much. Now that I have, I expect I will switch to using the ibm_db_dbi interface for my efforts.

Resources

0 comments
1158 views

Permalink