Cooking With CUBEs
As we should all know by now, PowerPivot provides Excel with a powerful way to harness data from one or more sources, and to do further analysis on that data within familiar pivot tables.
Furthermore, because PowerPivot is creating an in-memory cube of the data, it is possible to build an analysis using CUBE formulae. I have blogged a couple of times about CUBE formulae, in Cycling Through The Fog and in Cracking The Code.
In Excel, as with any development, you want your solution to be as flexible and dynamic as possible. This blog is about building dynamic tables using CUBE formulae, but to start with the following formula shows an example of a value extracted from a PowerPivot model using CUBE functions
CUBEMEMBER("PowerPivot Data","[Measures].[Sum of SalesAmount]"),
CUBEMEMBER("PowerPivot Data", "[DimProductCategory].[EnglishProductCategoryName].&[Bikes]"),
This formula gets the Sales Amount from the PowerPivot cube for the Bikes product category, for the fiscal year 2006. There will be many values at this intersection, there can be many dates in 2006 and many products within that category, all pre-aggregated in the cube; the CUBEVALUE function returns that aggregate amount
We could build the whole table of values using similar formulae. In our table we need to know what the value is related to, so we have row and column headers that identify the intersection points. We could define those headers using the CUBEMEMBER functions giving a table such as shown in Figure 1 below, which shows a table based on AdventureWorks.
Figure 1 - Table of values over year and product category
The formulas for the headings are
etc. for the column headings,
=CUBEMEMBER("PowerPivot Data", "[DimProductCategory].[EnglishProductCategoryname].&[Accessories]")
=CUBEMEMBER("PowerPivot Data", "[DimProductCategory].[EnglishProductCategoryname].&[Bikes]")
etc. for the row headings.
The values at the intersection points simply use these heading cells like so
=CUBEVALUE("PowerPivot Data","[Measures].[Sum of SalesAmount]",$A3,B$1)
This is equivalent to the formula given in Equation 1.
Slicing the Vegetables
Further richness is bestowed upon us because we can also link slicers to our table, giving us the sort of filtering we have with the pivot tables. For example, Figure 2 shows the same data table built using CUBE formulae with a fiscal year slicer; the data reflecting the fact that only the years 2006, 2007, and 2008 have been selected.
Figure 2 - Table of values reflecting years slicer selections
Showing the slicer selections on your report has been covered elsewhere, but it is so useful and asked so often that I thought I would also cover. I also have a couple of variations that I haven’t seen elsewhere, which are worth presenting.
Previously, as shown in the formulae in Equation 2, we built the row and column headers using hard-coded values for the year and category fields. We need to be more dynamic in how we list these values. To show the slicer selections as in E5, F5, etc., we need a list of values from which we can choose and display the individual ordered items. The CUBESET function gives us this. The syntax for CUBESET is
CUBESET(connection. set_expression, [caption], [sort_order], [sort_by])
where connection is the cube, set_expression is the set of values required, and caption is a value to display. So, looking at cell D1 we have the formula
=CUBESET("PowerPivot Data",Slicer_FiscalYear,"Set of Years
which would look as shown in Figure 3 when added to cell D1 to build our set of fiscal years.
Figure 3 – Slicer years set formula
As can be seen, we use Slicer_FiscalYear as the set_expression, so the set will include all selected values in that slicer, with the caption signifying the cell contents.
So far, so good, but we still need to list those selected values. For this, we use the CUBERANKEDMEMBER function, which returns the nth, or ranked, member in a set. The syntax of this
=CUBERANKEDMEMBER(connection, set_expression, rank, [caption])
where connection is the cube as before, set_expression is the set of values to choose from, and rank is nth value. So, to get the first member, we use
for the second
and so on.
Because there are 5 years in the PowerPivot model, and when filtered in the slicer we might be showing less than 5, we need to cater for a variable number of items. The simplest way is just to add an error wrapper around the formula,
Why Extra Ingredients?
We could nest the CUBESET function within the CUBERANKEDMEMBER,but that would mean that a set is evaluated 5 times. By defining the set in its own cell and referring to that cell within the CUBERANKEDMEMBER function, it is evaluated just the once. A small matter, but it makes the spreadsheet easier to maintain, and is more efficient.
Cooked To Perfection
It’s as simple as that.
But hang on a minute, have we overcooked it?
Looking at the syntax definition for these two functions, we can see that they both take set_expression as an argument. The CUBESET function is passed the slicer values as its set, and in turned is passed to the CUBERANKEDMEMBER function as its set.
As the slicer values is a set_expression, you would think that we should be able to pass the slicer values directly to CUBERANKEDMEMBER as a set and be done with. And so we can, these formulae
work equally as well as those formulae in Equation 4
Managing The Ingredients
The list of product categories can also be listed in a similar way using CUBESET and CUBERANKEDMEMBER. Here we do need CUBESET as there is no pre-defined set of values as we had with the fiscal year that we can pick up. The set will be all values for the Product Category English name in the Product Category table,
"Set of Categories")
As can be seen, .Children gets us all of the category values.
One thing to note is the use of the caption argument. Again, this helps to highlight the cell containing the set.
We now have formulae that can define our full table, such as
D1: the formula in Equation 3
=CUBESET("PowerPivot Data",Slicer_FiscalYear,"Set of Years")
D2: the formula in Equation 6
"Set of Categories")
E5:I5: the formulae in Equation 5
D6:D9: formulae for the product categories
And finally, in E6:I9, the formulae for the values
=IFERROR(CUBEVALUE("PowerPivot Data","[Measures].[Sum of SalesAmount]",$D6,E$5),"")
etc., each cell reflecting the correct product category ($D6) and fiscal year (E$5).
Our table now has a full set of values, and reflects the choices made in the fiscal year slicer.
(If we wished, we could add the product categories to a slicer, and make our table dynamically reflect that.)
Ready to Serve?
Although we have been diligent in storing the evaluated sets in one place rather than nest a CUBESET function within the CUBERANKEDMEMBER function, but there are still a number of things going on here that I just don’t like:
- The connection is hard-coded, multiple times
- If a new year is added to the data, just copying cell I5 to J5 won’t work because the rank is hard-coded in the formula, it will need a small change
- If no selection is made in the fiscal year slicer, the values shown are the total of all years, with a header value of All, as shown in Figure 4. This may be what is required in some instances, in others we may want to show each year’s values individually.
Figure 4 - Showing all years as a total with no slicer selections
These ‘difficulties’ can be overcome relatively easily.
Rather than hard-code the connection within each formula, put the connection text ‘PowerPivot data’ (without the quotes), in a cell, say D3, and assign an Excel name _cube, then use that defined name within the formulae. Note that this also makes the transition to Excel 2013 simpler, where the connection has now changed to ‘ThisWorkbookDataModel’ (again, without the quotes).
The rank is managed by using a function that returns a variable number depending upon the row or column of the cell, namely ROW([reference]) or COLUMN([reference]). You might think that you can use COLUMN(A1) in E5, and copy that across so that it updates to COLUMN(B1), COLUMN(C1), etc. Believe me, this is a very bad idea. Although everything will be fine at first, what happens if you decide to insert a column before column E? The answer is that COLUMN(A1) will update to COLUMN(B1) and whereas the first column of year table originally reflected the first selected year in the fiscal year slicer, it will now reflect the second. You might say that you would never do that, but no-one ever does until they do. For the sake of a simple change it is hardly worth risking it.
The suggested change is to use COLUMN()-COLUMN($D$5), which uses the top left cell of our table as an anchor point. Thus, a formula in cell E5 using these functions will return 1 for that calculation, and so on. If a column is inserted to the left of the table, those parts of formulae will update to COLUMN()-COLUMN($E$5), which means the formula that was in cell E5 which has now moved to cell F5 still return 1 for that calculation.
Similarly, the category list will use ROW()-ROW($D$5).
Finally, how can we show each year in the column headers and the vales for those years when no slicer selection is made, rather than showing ‘All’ and totals for all years? We already have the formula in D1 that gets the set of selected slicer years, that is
As we showed before, we can get a set of all year regardless of slicer selection with the CUBESET function and the member’s children property, as in
But how do we know when to use which? One way would be to test whether the first member of this slicer set returns All. If it does, there are no slicer selections so we show all years individually, if not we show the slicer selected years. We can check the first slicer set value with the following
Adding all three elements together, we have the following formula in D1 that determines what goes into the set of years that will drive the table column headings
With this formula to get the years set, we can see all the years listed when no selections are made on the ribbon rather than showing all year totals, as in Figure 5.
Figure 5 - Showing all years with no slicer selections
What's For Dessert?
That’s about it. Using this technique we have a table that shows the value by year by product category, with a slicer for selecting specific years which is reflected in the years shown in the table. The years and product categories are dynamically built and so can accommodate extra years and extra categories in the source data, and the years can also handle a full slicer set without showing the values as totals for all years.