Degenerate Dimension question

lots_of_questions · 30 Aug 2017 02:04:21 PM

Hi all,

I have read many points of view about degenerate dimensions. In the end I decided to keep the Invoice number in the fact table instead of creating a dimension that grows just as fast. Then I tried to create the measure dimension in Cognos. How do I add the Invoice number as an attribute? I don't see anyway to make this work. Do I make a dimension from the fact table with only one field? Do I have to change my philosophy and create an Invoice Dimension?

Thanks. Any help is appreciated.

MFGF · 31 Aug 2017 02:05:36 AM

Quote from: lots_of_questions on 30 Aug 2017 02:04:21 PM
Hi all,

I have read many points of view about degenerate dimensions. In the end I decided to keep the Invoice number in the fact table instead of creating a dimension that grows just as fast. Then I tried to create the measure dimension in Cognos. How do I add the Invoice number as an attribute? I don't see anyway to make this work. Do I make a dimension from the fact table with only one field? Do I have to change my philosophy and create an Invoice Dimension?

Thanks. Any help is appreciated.

Hi,

Putting descriptive items into a query subject which behaves as a fact table in your model is a big red flag. It can lead to "blind spots" (and therefore incorrect result sets) in stitch queries, plus (as you've discovered) there is no way to bring descriptive attributes into measure dimensions if you are setting up a DMR layer in your model. My earnest advice is to create a query subject for the invoice number (using an alias shortcut in your database layer) and link it to your fact table 1..1 <--> 1..n (with the fact at the n end of the relationship).

Cheers!

MF.

lots_of_questions · 31 Aug 2017 06:25:25 AM

So split the invoice number out and link it as if it was a dimension? That is basically what a 1..1 <--> 1..n connection is... correct?

MFGF · 31 Aug 2017 06:48:49 AM

Quote from: lots_of_questions on 31 Aug 2017 06:25:25 AM
So split the invoice number out and link it as if it was a dimension? That is basically what a 1..1 <--> 1..n connection is... correct?

Hi,

Yes, correct. :)

MF.

lots_of_questions · 31 Aug 2017 06:53:48 AM

Sounds good. Thanks.

Invisi · 04 Sep 2017 07:21:51 AM

What's wrong with leaving your invoice number in your fact table?

Lynn · 04 Sep 2017 07:39:44 AM

Quote from: Invisi on 04 Sep 2017 07:21:51 AM
What's wrong with leaving your invoice number in your fact table?

MFGF explained why

Quote from: MFGF on 31 Aug 2017 02:05:36 AM
Hi,

Putting descriptive items into a query subject which behaves as a fact table in your model is a big red flag. It can lead to "blind spots" (and therefore incorrect result sets) in stitch queries, plus (as you've discovered) there is no way to bring descriptive attributes into measure dimensions if you are setting up a DMR layer in your model. My earnest advice is to create a query subject for the invoice number (using an alias shortcut in your database layer) and link it to your fact table 1..1 <--> 1..n (with the fact at the n end of the relationship).

Cheers!

MF.

As he mentioned, the degenerate dimension can be created as an alias shortcut to the fact table. This means the invoice number can stay on the physical fact table. The model will include the table twice: Once as the fact table from which only the measures will be exposed to the user; Second as a dimension from which only the invoice number and any other attributes specific to a fact row will be exposed.

The fact table will be on the "n" side of the 1..n relationship and the alias shortcut to the degenerate dimension will be on the "1" side.

Invisi · 05 Sep 2017 07:01:43 AM

My dear Lynn, when I teach people star schema modeling, invoice number is the exact example of the column that is put in the fact table as a degenerate dimension. So, for me, from a data warehouse perspective, the fact table is where the invoice number belongs. As for modeling the framework, I miss a reason to model my degenerate column anywhere else than it is in the database. MFGF's answer doesn't help me, otherwise I wouldn't ask my new question.

In a framework I model something like an invoice number as part of the fact table, like it is in the data warehouse. Modeling it differently in a regular database based framework sounds to me like introducing needless complexity into the model. I see the suggestion of some benefit, but I miss the benefit. Will somebody tell me the benefit?

Lynn · 05 Sep 2017 07:28:46 AM

Quote from: Invisi on 05 Sep 2017 07:01:43 AM
My dear Lynn

:o

Moving along...

When a degenerate dimension is needed, the database fact table most certainly would have the keys, the facts, as well as the attributes. There is no need to have a separate fact table and a separate degenerate table physically in the database.

In your FM model, however, queries are generated based on what is identified as a fact and what is identified as a dimension. The cardinality is the basis for fact identification. A fact table is on the "n" side of all relationships. MFGF explained that exposing attributes from a table that behaves as a fact can lead to blind spots and therefore incorrect results.

To solve this, create an alias shortcut of your fact table to behave as the dimension. It will be joined to the fact table on "1" side of the relationship. The benefit is correct and predictable SQL generation leading to correct results. Chapters 4 and 9 of the FM user guide goes into further detail on these topics.

cognostechie · 07 Sep 2017 06:48:47 PM

Quote from: Invisi on 05 Sep 2017 07:01:43 AM
when I teach people star schema modeling, invoice number is the exact example of the column that is put in the fact table as a degenerate dimension. So, for me, from a data warehouse perspective, the fact table is where the invoice number belongs. As for modeling the framework, I miss a reason to model my degenerate column anywhere else than it is in the database.

This is exactly why I create an Invoice Dimension, the granularity of which will be at the Invoice Number level (one record per invoice) whereas the Invoice Line would have one record for each line (many lines per Invoice). Join both of them in FM with 1:1 <-> 1:n relationship. This will also be useful when you have to create another fact for Payments. Both the facts can then be joined to the dimension so it will be possible to make one report showing the Invoice amount and the Payment amount so the users can see how much is outstanding. Dimensions are not just for master data , they are also for transactional data.

the6campbells · 10 Sep 2017 11:14:44 AM

Keep in mind that modelling a dimension from the fact table potentially will result in an expensive query subject to the size of your fact table.

The query for the dimension is essentially (select distinct c1 from t or select c1 from t group by c1) which would cause a full pass of the available data pages of the table. Should you be working with multi-million/billon/trillion row fact tables you may find it better to manage an actual table with those distinct values.

As always, benchmark for the data volumes you expect to have as your fact data rolls on/off along with the frequency of trying to access just the degenerate keys.

cognostechie · 11 Sep 2017 12:35:27 PM

I meant create a table D_Invoice in the database which will have the granularity at the Invoice Header level, not at the line level (One record per Invoice_ID or whatever ID is at the header level). I am not talking about modelling a query subject from the line table. As for the number of rows , I have an order fo preference:

1> Data Integrity
2> Functionality
3> Performance

This does not mean that performance is not important but it means that it is not more important than data integrity. Let's also keep in mind that there is a difference between reporting and business intelligence and anybody who decides to use Framework Manager for BI is making a wrong decision to begin with. 80% of users in any organization are interested in BI and 20% (or even less) in line level reporting. Framework packages should only be used by that 20% and 80% should be using Cubes.

This is something most developers have a problem agreeing with and the result is that they decide to use only FM and then come up with all kinds of problems.

bdbits · 11 Sep 2017 01:03:39 PM

I can get on board with your priorities, but not sure I agree that an FM package is not a suitable basis for a BI solution. (At least, I thought that was what you said.) While I like cube technologies and build and use them every day, there are many organizations with dimensional data warehouses that do not necessarily need cubes. For example, frequent data refreshes or near-realtime data are required and a cube build takes too long. Or your model is backed by high-performance storage like Teradata and the like. Also, user communities I have worked with might start with cubes or aggregates to spot anomalies, but often want to drill down to line-level details to research them.

There are at least 101 ways to approach BI and reporting, partly because there are so many different needs in organizations and the people that populate them.

There, I am now climbing down from my soapbox. ;D

cognostechie · 11 Sep 2017 03:10:22 PM

Again, when you talk about drilling down to line level reporting and frequent data refreshes, you are referring to that 20% audience, not 80% (managers and above). I have yet to see a Manager or anybody above him/her who wants to know the details at the line level or who wants to include an hour old data for analysis and I have worked for organizations of all sizes and in all industries.

I did mention that for this kind of audience, you do need FM.

cognostechie · 11 Sep 2017 03:19:36 PM

and you can also drill to the line level detail from the cube by passing parameters to the FM package though normally that is done by analysts, not managers. Cube taking too long to get refreshed with the newer versions could be because the cube modelling needs attention and optimization. After all, most of data can be updated incrementally in the cubes except the data that changes like orders.

Invisi · 13 Sep 2017 04:21:39 AM

Quote from: cognostechie on 07 Sep 2017 06:48:47 PM
This is exactly why I create an Invoice Dimension, the granularity of which will be at the Invoice Number level (one record per invoice) whereas the Invoice Line would have one record for each line (many lines per Invoice). Join both of them in FM with 1:1 <-> 1:n relationship. This will also be useful when you have to create another fact for Payments. Both the facts can then be joined to the dimension so it will be possible to make one report showing the Invoice amount and the Payment amount so the users can see how much is outstanding. Dimensions are not just for master data , they are also for transactional data.

Why do you want to introduce an extra join in your data warehouse to a table that contains only 2 columns: your data warehouse key and your invoice number?

cognostechie · 13 Sep 2017 06:03:54 PM

To ensure proper generation of queries for data integrity even if the performance becomes slower.

Invisi · 25 Sep 2017 10:01:12 AM

Which data integrity do you think you lose?

BSP Software	Resources	About Us
MetaManager	BSP Software Training	BSP Software
Integrated Control Suite	YouTube Channel	Micro Strategies Inc
Security Migration		IBM Cognos
Integrated Management Suite

If you are unable to create a new account, please email support@bspsoftware.com

COGNOiSe.com - The IBM Cognos Community

News:

Degenerate Dimension question

lots_of_questions

MFGF

lots_of_questions

MFGF

lots_of_questions

Invisi

Lynn

Invisi

Lynn

cognostechie

the6campbells

cognostechie

bdbits

cognostechie

cognostechie

Invisi

cognostechie

Invisi