If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Attempting to aggregate using attribute in the fact table

Started by bberryhill, 06 Jan 2011 11:32:03 PM

Previous topic - Next topic

bberryhill

I'd like to know if my approach is correct or if there is a better way ...

Simple dimension - Customer.
Simple fact - invoice transactions.  In the invoice transaction table I have customer key, invoice number and amount (as well as other items but trying to keep it simple).
One to many relationship between Customer and Invoice Transactions on the customer key.

If I query Customer, the invoice number and the amount I get each customer, all of the invoice numbers but the total amount for each invoice is actually the sum of the invoice amounts for the customer.  So it is XSUMing the invoice amount for the customer, not the customer / invoice number.  I think it is building the XSUM based on the determinant of the customer dimension.

I started with a single determinant, unique, on the Invoice Transaction composed of the customer key - actually framework manager started with a single determinant, I have imported from data manager and let framework manager do most of the work.

I tried adding a group by determinant using the invoice number but that did not help.

I am pretty sure that if I split the Invoice Transaction table into two tables - a dimension and a fact table, and I put the attributes such as invoice number in the dimension table, I will be able to get the correct results when the query contains the invoice number.  My query would then include two dimensions (cstomer dimension and invoice dimension and one fact , the invoice tranaction) - but is this the "correct" approach?

Thanks for your help.

MFGF

Quote from: bberryhill on 06 Jan 2011 11:32:03 PMI am pretty sure that if I split the Invoice Transaction table into two tables - a dimension and a fact table, and I put the attributes such as invoice number in the dimension table, I will be able to get the correct results when the query contains the invoice number.  My query would then include two dimensions (cstomer dimension and invoice dimension and one fact , the invoice tranaction) - but is this the "correct" approach?

That is definitely the approach I would take.  Query generation is based on your structures conforming to true Star Schema methodology - Fact query subjects containing only measures and the required dimensional keys, and Dimension query subjects containing keys and attributes.

Regards,

MF.
Meep!

blom0344

There is an extensive discussion on facts and determinants here:

http://www.cognoise.com/community/index.php/topic,11557.0.html

In any case , a group by determinant should only be used on a shared dimension with multiple facts with different grain. Not at all applicable in your case I'd say

bberryhill

Thanks for your responses.  And, I have read the posts you referred to on determinants.  Just to help me solidify this in my mind...
The way I understod Kimbal, it would not be appropriate to create deep dimensions from transactional tables such as invoice transactions or sales tickets.  The resuting requirment to re-join these tables would be a performance problem.  But, Cognos seems to work much better with these "transactional" dimensions if you want to group or summarize by any of the attributes in the transactional table.  Is this a Cognos thing or did I just not understand Kimbal completely?  Would this be referred to as a non-conformed dimension?

Also - as mentioned, I use Data Manager to populate my data warehouse and I import the model to FM.  The import process creates a unique determinant on every fact table using all of the keys defined in the model.  I assume this is unneccesary but perhaps a safe-guard approach on Cognos's part?

Again, thanks for your help - I'll get started on that Invoice dimension.

Lynn

Invoice number sounds more like a degenerate dimension to me.

QuoteI started with a single determinant, unique, on the Invoice Transaction composed of the customer key - actually framework manager started with a single determinant, I have imported from data manager and let framework manager do most of the work.

It makes sense that this determinant setting would give you the sum across invoices at the customer level. I would remove the determinants from the fact table altogether or have just one to identify the uniqueness (no group by). 

http://www.rkimball.com/html/designtipsPDF/DesignTips2003/KimballDT46AnotherLook.pdf

sir_jeroen

#5
Lynn,

I agree with you on that. But I've to mention one situation in which you set more than one determinant on a fact table.
In case of a combined fact table e.g. order headers (with total amount) and order details  I would set a determinant (group by) on order header id and make all order header columns attributes to this key and I would set a determinant (unique) on order details (key= Detail Id)
In this case the total amounts aren't double counted. Make sure you set Rollup aggregate to Automatic!!

Btw. I always set an unique determinant to a query subject (if there are more than 1 determinants I remove the others (which can happen when importing a Query Subject))

Quote from: blom0344 on 07 Jan 2011 04:06:13 AM
In any case , a group by determinant should only be used on a shared dimension with multiple facts with different grain. Not at all applicable in your case I'd say
From my experience: Always set a determinant, because they control the way how the sql is generated and how attributes are counted.

bberryhill

Lynn,

The Kimbal design tip you referenced is one of the sources that causes me concern with this scenario.  I believe the Kimbal approach would call the invoice number in the fact table a degenerate dimension, until you join this DD or its surrogate key to a dimension table, i.e. the "invoice dimension".  So I'm good with that but then they say:

Quote... As rows are inserted into the fact table, new rows are also inserted into a related dimension table, often at the same rate as rows were added to the fact table. This situation should send a red flag waving. Usually when a dimension table is growing at roughly the same rate as the fact table, there is a degenerate dimension lurking that has been missed in the initial design.

This would be the case with an invoice dimension.  Normally there would be two facts per invoice dimension - the initial invoice fact with an amount due (and other amounts like tax) and then a second fact representing the payment.  There could be other facts - adjustments, finance charge, write-offs etc.  But normally, a ratio of 1 invoice dimension record per 2 invoice fact records.

The Kimball "red flag" concerns me, but I do not see another way to get a proper report showing each customer, each invoice for the customer and the proper invoice total at the invoice number detail.

MFGF

One point to bear in mind is that you are not developing a physical star schema in a database, but simply a logical representation of a simple star schema in the metadata in order to generate accurate, predictable results in your reports.

MF.
Meep!

bberryhill

MFGF,

Perhaps that is where I am going wrong - I am creating a physcial star schema in the data warehouse.  Mainly because I am using Data Manager to create the data warehouse and as much of the dimensional model as I can.  Some technical sales guy sold me on this approach  ::)

Would you suggest that the dimension should be created based on the physical fact table rather than a physical dimension table?  Is that what I'm missing?

I know I can do that in FM and I guess I might be able to define the dimension in Data Manager and just not use a dimension build.  My approach is to do as much as possible in DM, export to FM and then script any changes I need to make in the FM model.  I might be one of a few that use DM and export to FM.

Lynn

I'll be interested to hear how you make out with it all in the end so update if time permits.

What I was suggesting was to omit the the invoice dimension and leave it degenerate. You sould be able to get a correct result by setting the determinants to identify uniqueness -- in other words the unique fact determinant you had originally with just customer key is not what you want.

Based on ReportNet Addict's post, what is the granularity of your fact table? Mixed grain on a single fact table can pose some challenges with aggregation.

I'm far from expert here so always "FWIW" :)

cognostechie

Quote from: bberryhill on 06 Jan 2011 11:32:03 PM

I started with a single determinant, unique, on the Invoice Transaction composed of the customer key - actually framework manager started with a single determinant, I have imported from data manager and let framework manager do most of the work.


That's the wrong way to do it.

Look at the attached screenshot.

The first list report shows CustomerID, OrderID and Freight and it shows the correct amount for each Customer and each Order.

The second list report has a join between Order and Order Details but it still shows the Freight as well as Quantity correctly even if there are multiple records for each Order in the Order Detail table.

Your example is actually simpler than this because you are dealing with only one transaction table. I tend to misunderstand the requirements so let me know if this is what you are looking for and I would let you know how to set the determinants.

cognostechie

Also wanted to add that I have another scenario where I have only one Fact with all the lines inside it and even in that case, it shows the correct results when joined to the Dimension.

I don't think there is any need to split the Invoice Query subject into Dimension and Fact or change anything in Data Manager.

In your case, it is basically summing all Invoices for each Customer because you specified Customer key as the unique determinant.

sir_jeroen

Hi Cognostechie,

As your document shows, in my opinion that's the way how to deal with determinants and how they should work!!

cognostechie

Thanks RNA !

I aleays set the determinants based on what makes the data unique in those tables rather than what the join condition is.

bberryhill

Thanks for your comments.

I agree the determinant should be set based on what makes the data unique.  The way Cognos imports from Data Manager the unique determinant is not necessarily correct.
When I set the determinant correctly, I get the correct results when I query the data in the metadata layer.  By correctly, I mean a single, unique determinant with only one key - the item reference business key which makes each row unique.  The important part of the query being the inclusion of the item reference code in the XSUM:

Quoteselect
       D_CUSTOMER3.CUSTOMER_CODE  as  CUSTOMER_CODE,
       D_CUSTOMER3.NAME  as  NAME,
       F_INVOICE_TRANSACTION.ITEM_REF_CODE  as  ITEM_REF_CODE,
       XSUM(F_INVOICE_TRANSACTION.PRETAX_AMT  for D_CUSTOMER3.CUSTOMER_CODE,D_CUSTOMER3.NAME,F_INVOICE_TRANSACTION.ITEM_REF_CODE )  as  PRETAX_AMT

However, when I perform the query in the dimensional layer, I do not get the correct results.  The item reference code is not included in the XSUM

Quoteselect
       D_CUSTOMER3.CUSTOMER_CODE  as  CUSTOMER_CODE,
       D_CUSTOMER3.NAME  as  NAME,
       F_INVOICE_TRANSACTION.ITEM_REF_CODE  as  ITEM_REF_CODE,
       XSUM(F_INVOICE_TRANSACTION.PRETAX_AMT  for D_CUSTOMER3.CUSTOMER_CODE,D_CUSTOMER3.NAME,D_CUSTOMER3.ACCT_CAT_CODE )  as  PRETAX_AMT,
       D_CUSTOMER3.ACCT_CAT_CODE  as  ACCT_CAT_CODE
from

Adding an Invoice dimension in the dimensional layer with one level and the item reference code resolves the problem in the dimensional layer.  The item reference code becomes part of the XSUM.

Quoteselect
       D_CUSTOMER3.CUSTOMER_CODE  as  CUSTOMER_CODE,
       D_CUSTOMER3.NAME  as  NAME,
       F_INVOICE_TRANSACTION.ITEM_REF_CODE  as  ITEM_REF_CODE,
       XSUM(F_INVOICE_TRANSACTION.PRETAX_AMT  for D_CUSTOMER3.CUSTOMER_CODE,D_CUSTOMER3.NAME,F_INVOICE_TRANSACTION.ITEM_REF_CODE )  as  PRETAX_AMT


Would you mind looking at the underlying SQL in your example and also confirm that your test is being done over a dimensional model, not a relational model?  It appears the technique works correctly in the metadata layer which I believe represents a relational model.

Thankyou,


cognostechie

The results of the determinant in the Metadata Layer would be no different than that of Business Layer or the Dimension Layer. The determinants are not takne into effect in the Metadata Layer if you specify it to be unique. It is supposed to take effect on the Report because it will check how the report is constructed and accordingly it will decide based on the determinant how to write the SQL. That's the entire purpose of the determinant.

Look at the attached screen. I created the dimension with CustomerID and OrderID. As you can see, the List Report based on the Relational Query subjects give the same Freight and Quantity as the dimension which is based on a DMR. No difference in data.. and it is querying CustomerID from Customer Query Subject, OrderID and Freight from Order Header and Quantity from Order Details. That's how the DMR is structured to read the data.

You only need to set the determinant in the Metadata Layer and other layers under it (Business Layer, DMR Layer) will inherit it. No need to set determinants in different layers based on reporting requirements.  The only time you do that is when you need to override joins from Metadata Layer for role playing joins.