Articles & Content

Archives

Journey Toward a Healthy Warehouse (D08)

Topic: DB2 for LUW

Subtopic: 2008



DATE: 2008-10-14 (16:45 - 17:45)
SPEAKERS: Christine Mackey (Swansea University), David Ford (Swansea University)

Our brief was to design and build a Data Warehouse to store anonymised health, social and environmental records at an individual level for the whole of Wales. There were no known data models, no known volumes and no specific questions to drive the design. The brief was to build a research platform that could handle any volume of input data from multiple disparate data providers, link all the data together at an individual level to answer any question put to it, through SQL, GIS, data mining and free-text mining. This presentation will cover the technical elements of the story of the journey for a Data Warehouse from inception to production in less than 18 months: how we deployed DB2 on Blue-C, one of the largest Supercomputers in Europe dedicated to life science research; what we learned along the way; obstacles we have encountered and an opportunity to discuss some of the challenges yet to come.

EXP. LEVEL: Beginner,Intermediate

OBJECTIVES:

1.Get the best out of your current setup: Does your DB2 implementation match your physical hardware; ensuring maximum parallelism in a single partition environment

2.Use v9 to your advantage: Self tuning, getting a quick performance win in an environment where query content is very unpredictable; online users versus batch processing; knowing when to stop/limit; advantages of data compression

3.Knowing if and when to partition: Why and when to partition data and database; how to decide the database partitioning strategy and partitioning keys; advantages of table partitioning; impact of partitioning on data loading and management

4.Choosing indexes for a volatile query set: How to tune when users canít tell you what they want to do; balancing the performance needs of data loading and data query; impact of data design on necessity to index; using multi dimensional clustering

5.Monitoring performance, information versus overheads: Choosing benchmarking scripts; day to day monitoring



Click Here to Download

NOTE: These are only open to members of IDUG. If you are not a member, please CLICK HERE for more information.