Description
I know from the title this sounds a stupid idea, but bear with me...
Very often, the worst-performing calculations on a cube are those that do operations over large sets. Imagine you have a customer dimension with several million customers on; you might want to create calculated measures that tell you the following:
1) The number of customers who bought more than $10 worth of goods this month
2) The number of customers who bought more than $10 worth of goods last month but did not do so this month
... and so on. This is all very doable in MDX at the moment but can be slow - you're taking the set of all customers and then doing several filter operations, then counting the set you get back from the filter.
The thing is, a lot of the logic is repeated across these calculated measures. So, for calculation (1) above you would write MDX something like:
count(
filter(
customer.customer.customer.members
, measures.sales>10
))
But for (2) you'd have to say
count(
except(
filter(
customer.customer.customer.members
, (measures.sales,date.calendar.currentmember.prevmember)>10
),
filter(
customer.customer.customer.members
, measures.sales>10
)
)
In both cases, we need to work out which customers had sales of over $10 this month, but there's no way of sharing that set safely between calculated measures. Using named sets declared inline in (2) might improve its performance, I know, and in some circumstances you can reference a named set declared in one calculated member in another one but it's not a reliable technique.
Details
Category
Analysis Services
Proposed Solution
What I'd like to see is a supported, reliable way to cache sets for any given context and share those sets between calculated members. Allowing calculated members to return sets would be one option, but I suppose there would be more elegant ways of implementing it...
But, imagine we were able to create calculated measures that returned sets. For the example above, we could create a calculated measure tha returned a set as follows:
create member currentcube.measures.tendollarsales as
filter(
customer.customer.customer.members
, measures.sales>10
)
We could then write calculation (1) as follows:
count(measures.tendollarsales)
and we could then write calculation (2) as follows:
count(
except(
(measures.tendollarsales,
date.calendar.currentmember.prevmember)
,measures.tendollarsales
))
So long as the set returned by measures.tendollarsales was cached after the first time it had been evaluated for a particular context, if we included calculation (1) and (2) and multiple months in the same query, calculation (2) would run much faster because it could take advantage of this caching.