Friday, February 24, 2012

Commonlity?

Okay, here is my scenario:

The information used in this description will be considered sample data, as the actual dataset is much larger.

We have 3 products: F, M, and Z. Each product has several derivatives based on customization. We have one central list of parts that these products are built from, depending on the specs given. We have a number that represents the percentage of that part's occurrence in a given product ordered last year. It would look like this:

PART

PRODUCT

FREQUENCY

P1

F

1

P1

M

0.2

P2

F

0.15

P3

Z

1

P4

M

0.7

P4

F

0.34

P5

Z

0.17

P6

F

0.24

P6

M

1

P6

Z

0.5

The above shows that P1 occurs in every product (100%) F made last year, and only 15% of product F contained P2 last year.

I'm interested in creating an analysis that computes the number of parts that are the same between two given products (F, M, or Z). I created a cube that basically gives me this information:

F

M

Z

TOTAL

P1

1

0.2

0

1.2

P2

0.15

0

0

0.15

P3

0

0

1

1

P4

0.34

0.7

0

1.04

P5

0

0

0.17

0.17

P6

0.27

1

0.5

1.77

TOTAL

1.76

1.9

1.67

5.33

Based on that information, I would like to know how much "commonality" each product has, comparing 2 products at a time:

F

M

Z

F

1.76

F x M

F x Z

M

1.9

M x Z

Z

1.67

To get F x M, you take two times the SUM of the MINIMUM of the frequencies of each part per product, or in this case: 2 * SUM(MIN(1,0.2), MIN(0.15,0), MIN(0,0), MIN(0.34), MIN(0,0), MIN(0.27,1)). Then divide that by the total frequencies of each product, or in this case: SUM(1.76, 1.9). In this example, it (F x M) turns out to be 0.52459, which is the result I desire.

I am new to MDX and SQL Server 2005 in general, but not new to SQL or relational databases. I am not asking someone to solve this for me, I am merely asking for some points of reference, or something I can type into Google to get me on the right track. My colleague and I found something called COVARIANCE. I'd love to play around with it, but I cant figure out MDX syntax at the moment. I would be happy if someone could even give me an example for using the MIN function in the 'Calculations' tab of the Cube object explorer in Business Intelligence.

Your time and help are greatly appreciated.

-Brian

Hi Brian:

I think we can solve your issue with a calculated member which may be simpler than trying to use the MDX covariance() function. The solution I envision is a combination of the MIN() function, SUM() function and managing the product dimension member(s) we reference. However, I have a question about how you came up with the F x M result. I went through the math from your example and came up with a different answer than 0.52459. I don't think my answer formulation is correct and using the correct formula will influence the resulting MDX. Could you take a minute and see where my calculation went astray?

Min Result

MIN(1, 0.2)

0.2

MIN(0.15,0)

0

MIN(0,0)

0

MIN(0.34, 0.7)

0.34

MIN(0,0)

0

MIN(0.27, 1)

0.27

Sum of Mins

0.81

2x sum of mins

1.62

SUM(1.76, 1.9)

3.66

FxM

0.442622951

Thanks - PGoldy

|||

Oops! I was doing it by eyeshot and in a hurry--must have done one of the MINs wrong. The answer you got is the correct one, so could we start from there?

Thanks for your help.

|||

Hi Brian:

Thanks for the clarification. I think the best way to approach this is to do the following:

(1) express your issue as a formula

(2) translate the formula into MDX for the specific case you outlined.

(3) generalize the MDX to serve any product/part combination

First let’s express your issue as a formula (this is really for me, I’m sure you already understand it).

F x M = (<sum of min freq for all parts> * 2)/(<sum of F freq> + <sum of M freq>)

Now let’s draft some MDX to serve each part of the formula. First, some assumptions (since I don’t have access to your cube):

(1) Products are referenced by dimension: [Products], and hierarchy [Products]

(2) Parts are referenced by dimension: [Parts], and hierarchy [Parts]

(3) Measure we have is [Frequency] referenced as [Measures]. [Frequency]

<sum of min freq for all parts> - do the “brute force” technique where we explicitly call out how to get the min() for each Part in your example. We’ll get elegant later so we’re not hard coding part members. For now let’s call this SumofMins_FandM and make it part of the Measures

MIN(CROSSJOIN([Parts].[Parts].[P1], {[Products]. [Products].[F], [Products]. [Products].[M]}), [Measures].[Frequency])

+ MIN(CROSSJOIN([Parts].[Parts].[P2], {[Products]. [Products].[F], [Products]. [Products].[M]}), [Measures].[Frequency])

+ …<vary the part number for each part number used in Products F and M>

<sum of F freq> + <sum of M freq> - in MDX this is a simple tuple to gain the sum as long as the aggregation type for [Measures].[Frequency] is SUM. We make a new calculated member in the Measures dimension and call it FandMTotal. Here’s the MDX:

([Products].[Products].[F], [Measures].[Frequency]) + ([Products].[Products].[M], [Measures].[Frequency])

Our final calculation then looks like this:

([Measures].[SumofMins_FandM] * 2) / Measures.[ FandMTotal]

You can test out the calculated members in a query using the WITH clause as follows:

WITH

MEMBER Measures.[ FandMTotal] AS

‘([Products].[Products].[F], [Measures].[Frequency]) + ([Products].[Products].[M], [Measures].[Frequency])‘

MEMBER Measures.[SumofMins_FandM] AS

‘MIN(CROSSJOIN([Parts].[Parts].[P1], {[Products]. [Products].[F], [Products]. [Products].[M]}), [Measures].[Frequency])

+ MIN(CROSSJOIN([Parts].[Parts].[P2], {[Products]. [Products].[F], [Products]. [Products].[M]}), [Measures].[Frequency])

+ …<vary the part number for each part number used in Products F and M>‘

MEMBER Measures.[FxM] AS

‘([Measures].[SumofMins_FandM] * 2) / Measures.[ FandMTotal] ‘

SELECT

Measures.[FxM] ON COLUMNS

FROM [<your cube name>]

This should help for now. I’ll post the generic solution tomorrow where the parts and products are flexible. The above solution should get you started on how MDX works and the use of tuples, functions, and using sets.

PGoldy

|||

You assumptions of my cube structure, regarding dimensions and facts are correct.

I am playing with the code you have given me, and I'm looking forward to the more formal coding.

Thanks again.

|||

Paul:

I've been messing with your suggested formula all morning and I have come to a conclusion: the formula will (or should) work when the data is in the same format as the second table layout I have given you, products x parts. However, the data currently exists in the format of the very first table, parts x attributes. One of those attributes is the product it is a part of. I'm assuming it truly must make a difference on the format of the query.

Please advise, thanks.

-Brian

|||

Hi Brian:

First - apologies for the delay, I was away from the Forum yesterday because of work (go figure). When I drafted the preliminary solution I did assume the second table format. Might be a bad assumption on my part. Let me noodle on your information about hte format of data being in the first table structure. I also have had a devil of a time "genericising" the MDX I came up with. My problem, but I may not be able to come up with a general solution. We'll see. More later.

PGoldy

|||

Hi Brian:

I put somemore thought into the commonality issue and have a solution, but is has some restrictions/assumptions. The restrictions/assumptions are:

(1) The [Products] are called out on the COLUMNS of the query, and appear in the same order as their natural order in the hierarchy. (2) Measures are called out on the ROWS of the query. (3) Nesting dimensions may have unexpected results (untested).

The MDX below is broken into two calculated members. The first calculated member, [Sum of Mins] is the more difficult one and relies on the [Products] to be in the order as they appear in the hierarchy. [Sum of Mins] sums across the MIN() of all parts relative to the current product and current.next product. The second calulated member [Commonality] takes care of the rest of the math by multiplying [Sum of Mins] by 2 for the numerator, and adding the [Freq] for Products/CurrentMember, and Products.NextMember (also relying that Products called out on the columns are in their natural order).

Hope this helps. PGoldy

WITH

// Assumptions:

// (1) Product is on columns.

// (2) Products on columns are in same order as appear in hierarchy.

//

MEMBER Measures.[Sum_Of_Mins] AS

'

SUM([Parts].[Parts].Members,

MIN(CROSSJOIN([Parts].[Parts].CurrentMember

,{[Products].[Products].CurrentMember

,[Products].[Products].CurrentMember.NextMember})

, [Measures].[Freq]))

'

MEMBER Measures.[Commonality] AS

'

(2 * Measures.[Sum_Of_Mins])

/

(([Products].[Products].CurrentMember, [Measures].[Freq]) + ([Products].[Products].CurrentMember.NextMember, [Measures].[Freq]))

',FORMAT_STRING="#,0.00000"

SELECT

{[Products].[Products].[F]

,[Products].[Products].[M]}

ON COLUMNS

,{Measures.[Commonality]}

ON ROWS

FROM [<your cube>]

|||

What do you mean about heirarchy? What heirarchy? I currently do not have one, and frankly, when I try to define one (Product -> Parts), I get an error during cube deployment on MS Business Intelligence.

No comments:

Post a Comment