Repost: DB2 UDB EEE Scalability Demonstration on Sun Solaris

J. Michael Morrilll

Repost: DB2 UDB EEE Scalability Demonstration on Sun Solaris
Greetings!

Below is a posting from Mike Swift's DB2 Interested Parties mailing list. If you would like to join his list, you can e-mail him at
[login to unmask email] Not all posting to DB2 Interested Parties are re-posted here.

Cheers!

J. Michael Morrill
Chief DB2-L List Owner

=


Dear DB2 Interested Parties,

Below is a brief summary of a demonstration performed by Sun Microsystems and
IBM in support of a large data warehouse proposal. The results are excellent,
showing excellent scalability for the query workload with database size ranging
from 100 GB to 1 TB (raw data).

For more information on the IBM/Sun DB2 alliance, including customer case
studies and the white paper, "IBM DB2 Universal Database for the Solaris
Operating Environment", go to http://www.software.ibm.com/data/sun .

Regards, Mike
+ + + + + + +

DB2 UDB EEE Scalability Demonstration on Sun Solaris

In March and April of this year, IBM and Sun partnered to demonstrate
the scalability of DB2 Universal Database Enterprise - Extended Edition
in powering large scale decision support system (DSS) applications.
Accomplished in a timeframe so short that it virtually eliminated any
time for errors and any significant tuning of either the system or the
database, (4 ½ weeks to assemble the hardware and build a database for a
1 TB demonstration) the demonstration was nothing short of a complete
technical success! While scaling the database by a factor of 10, DB2
Universal Database Enterprise - Extended Edition was able to deliver
over 93% of linear scalability.

The demonstration was conducted in two phases, with the same DSS
workload being run against a database loaded with 100 GB of raw data and
then again against a database 10 times that size. The database schema
modeled a simple Order/Entry database and was borrowed from a well known
DSS workload. A data generation program was used to produce the
necessary input files for the demonstration at both sizes.

The first portion of the demonstration was run using two database
partitions on a Sun Ultra Enterprise 4500 server with eight 336 MHz CPUs
and 3 GB RAM. One Sun A1000 disk array, One Sun UltraSCSI multi-disk
pack, three Sun A5000 disk arrays and one Symbios disk array were
attached to the server to provide a total of 558 GB of storage for the
demonstration. Approximately 200 GB was allocated for the tables and
indexes, with an additional 120 GB allocated for temporary database
space. The tables and indexes actually used about 150 GB of storage.
The largest table occupied over 75 GB! No mirroring or RAID was used for
the database.

The second portion of the demonstration was run using sixteen database
partitions on a Sun Ultra Enterprise 10000 server with sixty-four 400
MHz CPUs and 40 GB RAM. One Sun A3500 disk array, twenty Sun A5000 disk
arrays and twenty-six Sun A5200 disk arrays were attached to the server
to provide a total of 8,208 GB of storage for the demonstration. Almost
2100 GB was allocated for the tables and indexes, with an additional 760
GB allocated for temporary database space. The tables and indexes
actually used over 1500 GB of storage. The largest table occupied over
780 GB! The entire database was mirrored to protect against media
failure during the demonstration.

The workload consisted of 10 queries (and an additional 11 variants for
the 1000 GB portion of the demonstration) constructed by our customer
which simulated the type of work they anticipated they would support in
their data warehouse. The queries involved table scans, joins, nested
subqueries, index-only processing and aggregation. What follows is a
summary of some of the key metrics obtained during the demonstration.


Query Execution - Fixed Workload, Varying Configuration
-------------------------------------------------------
Number of Concurrent Queries 10 10 10
Number of CPUs 8 64 24
Database Size (GB) 100 1,000 1,000
Database Size / CPU (GB) 12.5 15.63 41.67
Execution Time (Minutes) 74.37 112.98 265.02

To measure scalability, let?s use the 100 GB execution as a base. Since
the 1,000 GB execution using 64 CPUs has 1.25 as much data being managed
by each CPU, with 100% linear scalability, the query workload would run
1.25 times as long, or 92.96 minutes. Instead, the query workload ran
in 112.98 minutes, yielding a scale factor of 82.28%. Following the same
logic, the 1,000 GB execution using 24 CPUs have run 3.33 times as long,
or 247.92 minutes. Instead, the query workload ran in 265.02 minutes,
yielding a scale factor of 93.55%!

Query Execution - Fixed Configurations, Varying Workload
--------------------------------------------------------
Number of CPUs 24 24 64 64
Number of Concurrent Queries 10 21 10 21
Database Size (GB) 1,000 1,000 1,000 1,000
Number of Database Partitions 16 16 16 16
Execution Time (Minutes) 265.02 484.37 112.98 208.31

For each of these two sets of measurements, the server configuration was
kept constant (at 24, then 64 CPUs), but the workload was varied. While
the number of concurrent queries was more than doubled, the execution
time for the queries increased by less than 85% in each case!

Additional measurements for data loading, index creation and statistics
gathering follow.

Data Loading
------------
100 GB 1000 GB
Load Rates GB/Hr GB/CPU/Hr GB/Hr GB/CPU/Hr
Average 17.01 2.13 222.35 3.47
For Largest Table 18.54 2.32 236.54 3.70
Maximum 23.45 2.93 366.34 5.72

Note: Data was loaded using AutoLoad for the 100 GB portion of the
demonstration. Data was pre-split and loaded using the DB2 UDB Load
Utility for the 1000 GB portion of the demonstration.

Index Creation (at 100 GB)
--------------------------
Key Create
Size Size Time
Table Columns (Bytes) Rows Unique Keys (GB) (Min.)
Line Item 1 4 600,037,902 20,000,000 1.66 54.38
Orders 1 4 150,000,000 9,995,435 0.45 19.93
Orders 3 23 150,000,000 150,000,000 2.43 23.27
PartSupp 2 8 80,000,000 80,000,000 0.67 25.08

Note: Sizes are approximate and actually reflect the number of leaf
pages in the index. Actual index sizes are somewhat larger. Not all
index creation statistics are included here.

Index Creation (at 1000 GB)
---------------------------
Key Create
Size Size Time
Table Columns (Bytes) Rows Unique Keys (GB) (Min.)
Line Item 1 4 5,999,989,709 170,941,764 25.08 178.38
Orders 1 4 1,500,000,000 61,949,224 7.51 49.17
Orders 3 23 1,500,000,000 1,500,000,000 27.51 102.40
PartSupp 2 8 800,000,000 800,000,000 6.71 83.17

Note: Sizes are approximate and actually reflect the number of leaf
pages in the index. Actual index sizes are somewhat larger. Not all
index creation statistics are included here.


Runstats (at 1000 GB)
---------------------
Execution
Nbr Time
Table Rows Idx's Type of Statistics (Minutes)
PartSupp 800,000,000 2 Distribution,
Detailed Indexes 12.90
Orders 1,500,000,000 5 Detailed Indexes 1,015.68
LineItem 5,999,989,709 5 Indexes 155.00
+ + + + + + +

Mike Swift
Internet: [login to unmask email]
IBM Santa Teresa Lab - 555 Bailey Avenue - San Jose, CA 95141
Phone: 408 463-4105, Fax: 408 463-4633

DB2 - Powering the world's e-business solutions!
IBM's Data Management Solutions: -> www.ibm.com/software/data
Examples of successful implementations: -> www.ibm.com/software/casestudies
International DB2 Users Group: -> www.idug.org