AI-Enhanced Db2 for z/OS: Revealing Hidden Insights through Modern Application Integration

April 23, 2025

Posted By: Vineet Dumir in Technical Content,

In today's data-driven world, enterprises need seamless integration between databases and modern applications to unlock valuable insights. IBM Db2 for z/OS, a powerful relational database, offers robust connectivity options with Apache Spark and Python, enabling efficient data processing, analytics, and AI-driven insights.

This blog explores how Spark and Python connectivity to Db2 can streamline data workflows and enhance real-time analytics. Additionally, we delve into IBM Db2 SQL Data Insights (SQLDI) and its use of word embedding techniques to uncover hidden patterns within structured and unstructured data. By leveraging these capabilities, businesses can monetize hidden insights within their Db2 ecosystem and drive smarter decision-making.

The Convergence of Traditional Databases and Modern AI

For decades, IBM Db2 for z/OS has been the backbone of mission-critical applications for enterprises worldwide. Its reliability, performance, and security are unmatched. However, as businesses increasingly adopt AI and machine learning technologies, there's a growing need to bridge the gap between traditional relational databases and modern data science tools.

This convergence presents both challenges and opportunities. Organizations with vast amounts of valuable data stored in Db2 can now leverage this data for AI-driven insights without complex ETL (Extract, Transform, Load) processes or data duplication, which can introduce latency and consistency issues.

Python Connectivity with Db2 for z/OS

Python is the preferred language for data scientists and AI developers. The Python AI Toolkit for IBM z/OS offers secure, vetted tools and libraries, enabling enterprises to build and deploy AI on z/OS while leveraging their existing infrastructure for mission-critical applications.

With the Python AI toolkit, which includes the ibm_db driver, connecting Python applications to Db2 for z/OS becomes seamless, enabling smooth integration between robust database capabilities and advanced AI functionalities. This integration empowers organizations to maintain data residency on the secure mainframe while applying modern AI techniques.

Here's a simple example of how to connect to Db2 for z/OS using Python:

import ibm_db

import pandas as pd

# Connection string parameters

dsn_driver = "IBM DB2 ODBC DRIVER"

dsn_database = "SAMPLE" # Database name

dsn_hostname = "zos.example.com" # z/OS host

dsn_port = "5040" # Port number

dsn_protocol = "TCPIP" # Connection protocol

dsn_uid = "db2user" # User ID

dsn_pwd = "password" # Password

# Construct the connection string

conn_string = (

f"DRIVER={{{dsn_driver}}};"

f"DATABASE={dsn_database};"

f"HOSTNAME={dsn_hostname};"

f"PORT={dsn_port};"

f"PROTOCOL={dsn_protocol};"

f"UID={dsn_uid};"

f"PWD={dsn_pwd};"

)

# Establish the database connection

try:

conn = ibm_db.connect(conn_string, '', '')

print("Connected to the database")

except Exception as e:

print(f"Unable to connect: {e}")

exit()

# Query to execute

select1 = "SELECT name, creator, dbname, tsname FROM SYSIBM.SYSTABLES FETCH FIRST 10 ROWS ONLY”

# Fetching data and creating DataFrame

try:

stmt = ibm_db.prepare(conn, select1)

if stmt:

ibm_db.execute(stmt)

rows = []

row = ibm_db.fetch_tuple(stmt)

while row:

rows.append(row)

row = ibm_db.fetch_tuple(stmt)

# Creating a DataFrame

df = pd.DataFrame(rows, columns=["NAME", "CREATOR", "DBNAME", "TSNAME”])

print(df)

else:

print("Failed to prepare the SQL statement.")

except Exception as e:

print(f"Error in SQL execution: {e}")

print(f"SQL Error: {ibm_db.stmt_error()}")

print(f"SQL Error Message: {ibm_db.stmt_errormsg()}")

finally:

# Ensure the connection is closed

if conn:

ibm_db.close(conn)

This example demonstrates how to query catalog table data from Db2 for z/OS and convert it to a pandas DataFrame, which is a popular data structure for data analysis in Python. From here, you can apply various machine learning algorithms or data visualization techniques to extract insights.

IBM Z Platform for Apache Spark Integration with Db2 for z/OS

z/OS Spark is built on Apache Spark, a high-performance engine for large-scale data processing. A key advantage is its in-memory computing, which caches intermediate results in memory instead of writing to disk, significantly boosting iterative processing performance. With its powerful distributed computing capabilities, Apache Spark enables efficient handling of large datasets. When combined with Db2 for z/OS, it enables efficient processing of massive datasets without moving data out of the secure mainframe environment.

Here's an example of how to connect Spark to Db2 for z/OS:

from pyspark.sql import SparkSession

# Initialize Spark session

spark = SparkSession.builder \

.appName("Db2 Integration") \

.config("spark.jars", "/path/to/db2jcc4.jar") \

.getOrCreate()

# Connection properties

jdbc_url = "jdbc:db2://zos.example.com:5040/SAMPLE"

connection_properties = {

"user": "db2user",

"password": "password",

"driver": "com.ibm.db2.jcc.DB2Driver"

}

# Read data from Db2 table

df = spark.read.jdbc(

url=jdbc_url,

table="SALES_DATA",

properties=connection_properties

)

# Register as temporary view for SQL queries

df.createOrReplaceTempView("sales")

# Perform analytics using Spark SQL

result = spark.sql("""

SELECT

REGION,

PRODUCT_CATEGORY,

SUM(REVENUE) as TOTAL_REVENUE,

AVG(QUANTITY) as AVG_QUANTITY

FROM sales

GROUP BY REGION, PRODUCT_CATEGORY

ORDER BY TOTAL_REVENUE DESC

""")

# Show results

result.show()

# Write aggregated results back to Db2

result.write.jdbc(

url=jdbc_url,

table="SALES_SUMMARY",

mode="overwrite",

properties=connection_properties

)

This example demonstrates:

Setting up a Spark session with the Db2 JDBC driver
Reading data from a Db2 table into a Spark DataFrame
Performing analytics using Spark SQL
Writing the results back to Db2

Unlocking Hidden Insights with IBM Db2 SQL Data Insights (SQLDI)

IBM Db2 SQL Data Insights (SQLDI) takes AI integration a step further by incorporating word embedding techniques directly within the database. This allows for semantic understanding of data, revealing hidden patterns and relationships that traditional SQL queries might miss.

SQLDI works by converting categorical and textual data into vector embeddings, enabling similarity searches and pattern recognition. This is particularly valuable for identifying relationships between seemingly unrelated data points.

Here's an example of using SQLDI to find the top five customers by ID most similar to the customer with ID '3668-QPYBK':

SELECT AI_SIMILARITY(CUSTOMERID, '3668-QPYBK'), CHURN.*

FROM CHURN

ORDER BY 1 DESC

FETCH FIRST 5 ROWS ONLY;

The AI_SIMILARITY function computes a similarity score between two values. Additionally, Db2 introduces the following built-in scalar functions to support SQL DI:

SQL Data Insights Semantic Operations	Db2 Built-in Semantic Functions	Functional Description
Semantic similarity and dissimilarities	AI_SIMILARITY (Db2 13 FL500)	Matching rows/entities based on overall meaning (similarity/dissimilarity) Example: Identify customers with spending patterns similar to high-value clients to offer personalized investment plans or premium banking services.
Semantic Clustering	AI_SEMANTIC_CLUSTER (Db2 13 FL500)	Captures the dominant traits from the input values, and returns values that have these dominant traits. Example: Identify products with similar customer sentiment and purchasing patterns as (high-end laptops, premium smartphones) to optimize pricing strategies and targeted marketing campaigns.
Reasoning Analogy	AI_ANALOGY (Db2 13 FL500)	Find whether a relationship between a pair of entities applies to a second pair of entities, which has applications in retail, such as to determine whether a customer prefers a product and whether other customers have the same degree of preference for other products. Example: Credit Score: Loan Approval:: Claim History : Insurance Premium
Pattern Queries	AI_COMMONALITY (Db2 13 FL504)	Identify entities that are outliers or with most common behavior. Example: Identify policyholders who exhibit uncommon claim patterns, such as frequent small claims, which may indicate potential fraud or risk-prone behavior.

Conclusion

The integration of IBM Db2 for z/OS with modern AI technologies through Python and Spark connectivity represents a significant evolution in enterprise data management. Organizations can now leverage their existing Db2 investments while adopting cutting-edge AI capabilities to extract deeper insights.

By combining the reliability and security of Db2 for z/OS with the flexibility and innovation of modern data science tools, businesses can transform their data assets into strategic advantages. Whether it's through simple Python scripts, powerful Spark applications, or advanced SQLDI embeddings, the path to AI-enhanced insights has never been more accessible.

As AI continues to evolve, so too will the methods for integrating it with traditional database systems. Organizations that embrace this convergence will be well-positioned to unlock the full potential of their data and drive smarter, more informed decision-making.

Reference links

Python AI toolkit for IBM z/OS

https://ibm-z-oss-oda.github.io/python_ai_toolkit_zos/

IBM Z platform for Apache Spark https://www.ibm.com/docs/en/zpas/1.1.0?topic=introduction-z-platform-apache-spark
IBM Db2 SQL Data Insights

https://www.ibm.com/docs/en/db2-for-zos/13?topic=insights-db2-built-in-functions-sql-di

Vineet Dumir is a Data and AI solution architect at IBM Systems Development Labs. He brings over 15 years of experience in Data and AI, specializing in machine learning, artificial intelligence and generative AI. Additionally, he has extensive expertise as a Db2 Database Administrator and Db2 Systems Programmer on the z/OS mainframe platform with a strong focus on data modernization initiatives.

The Db2 Adaptive Workload Manager Part 1: Stability out of the BoxApril 16, 2025 IBM’s Power10 Private Cloud Rack for Db2 Warehouse - Chapter 2: Performance Deep-DiveApril 24, 2025