Business Intelligence–why do we need ETL and OLAP? Conceptual model
For these who start in Business Intelligence, such words as ETL and OLAP might be already known, but it’s quite hard to find really good description of WHY do we need these activities for Business Intelligence and what’s the conceptual process of building a BI system.
Let’s review BI artefacts and their relation between each other. The most common BI system evolve through the following steps:
This can be represented by the picture below (taken from “Microsoft SharePoint 2010 PerformancePoint Services Unleashed” book)
Step 1 – Database storage. All information is saving into relation database in the normalized format (usually), that allows to manipulate data effectively. Information stored in database is represented in flat format, that is very efficient for the data storage and retrieval.
Step 2 – Measures and Dimensions. Database systems store terabytes of data, which is just a raw data and not usable from the business perspective. For the business level we need to have “information” instead of data. There is a quite significant difference between “data” and “information” - data is an information, but information *is not* a data. For analysis purposes we need to converted data to information.
Definition: “Information is an organised data, in the form relevant for us”. It means that to get “Information” we need to extract the most valuable data and transform it to the usable form (this process is called ETL – extract, process and load). In the result of this transformation we get “Measures”. Measures are exactly that information you want to analyse. Measures examples: sales, defective product, staff retention. However, measures need to be specified by range of something. We need to create “Dimension” to group all our Measured by relevant values. For example, dimension can be identified as: period of time, region, category, thus dimension allows us to group measures by context (in other words, group similar values to filter your information). Take into account that dimensions can be organized in hierarchies. For example, sales measures can be dimensioned by country, then by state, then by city and then by price.
Step 3 – Cube. Organizing data in measures which are categorized by dimensions is resulting into the OLAP cube (Online analytical processing) created from a specific schema (star or snowflake). Cube representation allows to analyse data using different approach to drill, aggregate, decompose and report data for the forecasting, budgeting, planning, and other purposes.