In this demonstration, we’ll use a Jupyter Notebook from IBM Watson Studio to show how you can build a Linear Regression model inside Db2. We will use the GoSales dataset where the Machine Learning task is to predict the purchase amount for each customer based on demographic information and shopping history.
Why Machine Learning inside a Database?
When developing Machine Learning models with data stored in a database, the data is often copied from the database to separate systems. As the data grows in volume and complexity, moving data between systems can bring several challenges. For example, populating a Pandas Dataframe with 100 million rows, or 25GB of data, from a remote Database could take about 75 minutes, and the data might not even fit into the development machine’s memory. Additionally, data privacy regulations might prevent moving data outside of the database. With IBM Db2 11.5.4, you will be able to build, deploy, and infuse AI all inside your Db2 database. This will accelerate your AI workflow, keep your data protected, and simplify infusing AI in your business processes and applications.