Data warehouses are used as sources of reporting data for organisations. Much of CDP’s data warehousing work follows the Ralph Kimball concept of conformed dimensions. Kimball’s dimensional design technique makes it possible to build a data warehouse, one data mart at a time and consists of a number of elements:
Operational source systems
are developed to capture and process business transactions. These systems are designed for data entry, not reporting, and ETL (extract, load and transform) tools get the data they need to construct the data warehouse.
The data staging
area is where the raw operational data is placed then transformed to make it ready for users. The data staging area lies between the operational source systems and the data warehouse and is not generally accessible to users.
A data mart
is a subset of an enterprise data warehouse. It is more efficient to build a data warehouse iteratively by creating integrated data marts than by constructing the entire warehouse at once. It is an incremental process.
Data warehouse data is organised and stored specifically for direct user queries and reports and differs from an operational source system in that it is designed primarily for reading data.
A well-designed data warehouse will:
- Facilitate reporting and analysis.
- Maintain historical information.
- Provide an integrated view of the enterprise, across many subject areas.
- Be implemented in phases, delivering business value at each stage.
- Allow changes in both source data and reporting requirements to be implemented easily.
The data warehouse should be derived from user reporting and analysis needs. User input into the requirements of the data warehouse is critical and will dictate how changing dimensions are handled, how data is summarised and the hierarchical structure of the data in the dimension tables. It sounds like apple pie and motherhood, but maintaining contact with users during the project is critical to its success.
The data marts that make up a data warehouse usually have a star structure that consists of dimension tables and fact tables:
- Dimension tables contain textual descriptions of business components.
- Fact tables contain measures (numeric data) and foreign keys to the dimension tables.
The Kimball approach emphasises conformed, or shared, dimensions that are used with more than one fact table. A conformed dimension is re-usable and identical in every data mart in which it is used. Various business topics become natural data marts. A fully integrated data warehouse will have data marts that use conformed dimensions and each data mart will have a set of measures or fact tables. Separating data by subject makes the warehouse flexible and easy to maintain.
Frequently, cubes (sometimes referred to as OLAP – on-line analytical processing) are built off data marts and data warehouses as part of the overall data warehouse. They are dimensional versions of a data mart using dimensional methods of reporting. The benefit of OLAP cubes are that they enable users to interactively analyse data from multiple perspectives by drilling down and up; and slicing and dicing.