Relational database management systems (RDBMS) may not be capable of performing the detailed functions required in an iterative data warehousing environment.

The iterative processing which is required in a data warehouse environment tends to be more unpredictable and requires a much more sophisticated approach than a relational database management system (RDBMS) provides.

What is needed is a processing paradigm that is designed to handle the real-time data query environment of the operational data warehouse. The selection of the right database management system is critical to the proper management, analysis and application of customer data in a data warehouse.

Detailed Data Requirements

Many performance decisions traditionally made by database designers, administrators, and application developers are made by untrained users and automated tools. Detailed data demands a much larger on-line environment than for these traditional situations and adds another level of complexity to the selection and implementation of an RDBMS that will meet the objectives of the CRM system.

These requirements can be met with a database approach which provides certain well-defined characteristics.

The database management system selected to handle the business information workload must have the capability to handle the following characteristics of a data warehouse:

  • Complexity of the data model
  • Number of concurrent users
  • Data volumes
  • Complexity of the processing environment.
Database Management Systems: Iterative Database Environment Needs Sophisticated Database System
Database Management Systems: Iterative Database Environment Needs Sophisticated Database System

Complexity of the Data Model

Iterative processing is capable of running against any data in a data warehouse. Therefore, the data must be modeled to match the business, rather than for a specific set of applications. The database model which results from this condition is referred to as a ‘third-normal form’ model, and is very complex, placing considerable stress on a database management system.

In a data warehousing environment, every data modeling concession made for performance will leave some questions unanswered. Unanswered questions mean un-addressed business issues, and a more difficult support environment.

Number of Concurrent Users

A data warehouse may have a mix of iterative and repetitive users, at any given time, and the number and mix of users are important characteristics from a system performance perspective.

In the discussion of product selection methodologies, it was pointed out that benchmarking has some limitations, and one of them is that it does not adequately simulate the case where there are multiple users of different types.

Data Volume

Data volume is a critical element affecting the selection of the database management system, because some products may have bottlenecks that do not show up at lower data volumes. The amount of data may significantly impact the capability of the system to load data within a batch window.

For example, the inability to meet batch windows with a normalized model may require one table to be split into several smaller tables. Parsers have distinct limits which vary by RDBMS product, in determining how many tables can join in one query.

Complexity of Processing

The RDBMS processing capability is affected by the iterative processing and detailed data which presents a more complex processing environment, particularly when the user mix may include both iterative and repetitive types. The number of users and the workload mix are crucial components to the data warehouse.

Combinations of large amounts of data and large number of users often cause failure in a data warehouse environment, and definitely increase the complexity of the processing environment.

The stringent conditions applied to the selection of a database management system for a data warehousing application are very important to the successful implementation of a corporate CRM strategy. The database system selected must be capable of providng the highest level of processing required by the data warehouse.