If you are unable to create a new account, please email support@bspsoftware.com

 

Gross question about impact of using the wrong choice in Join Cardinality.

Started by FerdH4, 15 May 2022 03:17:23 PM

Previous topic - Next topic

FerdH4

Please let me elaborate else those of you who want to help me will give answers to the wrong question.

I'm specifically asking about the second character in the Cardinality within a Report's Join - the "n" in the example of "1..n" please.

I get how the first character of "1" or "0" works - I don't need help there.  I also generally understand the database relationships of one-to-one and one-to-many.

I am trying to understand whether the misuse of a "..1" or a "..n" will actually have an impact on the report's output???

If so, can you please share with me an example so that I can better understand.

In my limited knowledge, I'm thinking that misusing the wrong "..1" or "..n" could have an impact on the efficiency of any query statements generated in the background.  Hence, maybe there will be an execution runtime inefficiency.  However, with all of my many attempts to test swapping out a "...1" with a "..n" and, the reverse, I have never seen any obvious impact on my report outputs.

Thanks much.

MFGF

Quote from: FerdH4 on 15 May 2022 03:17:23 PM
Please let me elaborate else those of you who want to help me will give answers to the wrong question.

I'm specifically asking about the second character in the Cardinality within a Report's Join - the "n" in the example of "1..n" please.

I get how the first character of "1" or "0" works - I don't need help there.  I also generally understand the database relationships of one-to-one and one-to-many.

I am trying to understand whether the misuse of a "..1" or a "..n" will actually have an impact on the report's output???

If so, can you please share with me an example so that I can better understand.

In my limited knowledge, I'm thinking that misusing the wrong "..1" or "..n" could have an impact on the efficiency of any query statements generated in the background.  Hence, maybe there will be an execution runtime inefficiency.  However, with all of my many attempts to test swapping out a "...1" with a "..n" and, the reverse, I have never seen any obvious impact on my report outputs.

Thanks much.

Hi,

Yes, I completely get what you're asking here: what is the importance/significance of using one or many in relationships within your model, and in what circumstances will it make a difference.

The first (and most obvious) point to make is that all your initial relationships should reflect what is really going on in your data - so if a product line has many product types, and a product type belongs to only one product line, the relationship between these two query subjects should be 1 to many. Depending on how your data is physically arranged in the database(s), that's likely just the beginning step, though.

The second, and very important thing to know is that the query engine makes assumptions about how your data is stored, and uses these assumptions to create (hopefully) accurate, consistent queries. The assumptions it makes are all based on your data being arranged in either star schemas or snowflake schemas - as you'd find in the presentation layer of a data warehouse. I'm not sure of your prior experience of these, so I'll explain how they are structured - but please don't be offended if you already know this bit. In schemas like this, the data is held in two types of tables - Fact tables and Dimension tables. Dimension tables contain descriptive attributes and their corresponding keys, and each dimension usually contains a different aspect of descriptive data. They are often based around questions such as "who", "what", "where", "when" etc. If we take Sales data as an example, the "who" question might be the Customer dimension, the "what" question might be the Product dimension, the "where" question might be the Location dimension, and the "when" question might be the Date dimension. Fact tables contain measure values and keys to link them to each relevant dimension. The relationships between Dimension tables and Fact tables will always be 1 to many, with the Fact at the many end. In a star schema, each dimension is a single table, whereas in a snowflake schema, each dimension can have multiple tables - linked 1 to many usually, with the one ends furthest away from the Fact table(s).

Here comes the interesting bit. The query engine contains intelligence to figure out if it is reporting from a single fact table or from multiple fact tables. If it detects multiple fact tables, it creates what is called a Stitch Query - where it effectively sends a separate request to each fact, then Full Outer Joins the results together, based on the common dimensional values in the query. It does this because (for example, reporting from a Sales fact table and a Marketing fact table) there may be rows in one fact table that don't relate to rows in the other, and vice versa. So there may be sales for products that were never marketed, and there may be marketing campaigns for products that have no sales. In the case of multiple fact tables, this is the correct thing to do. How does the query engine figure out if it is reporting over multiple fact tables? It uses the cardinalities of the relationships to make this determination. The rules it uses are:

If a query subject is at the many end of ALL relationships to it used in the query, it is treated as a fact table.
If a query subject is at the one end of ANY relationship to it used in the query, it is treated as a dimension table.

For any query where the rules above return multiple fact tables, a stitch query will be generated.

Now, if your data is really held in star or snowflake schemas, all will be good - you will get stitch queries where you need them, and no stitch queries where you don't need them. The model will return accurate, consistent results. However, if your data is not stored in star or snowflake schemas, you could be in a lot of trouble. You could easily end up with what is termed an "unwanted query split" - in other words, the query engine has generated a stitch query, assuming multiple facts, but this is the wrong thing to do. Your report will return weird, incorrect results, and it could also be inconsistent - adding an item from another query subject might bring in a new relationship to the mix and change the way the query engine treats a particular table.

So, what should you do to make sure your model always returns accurate, consistent results? If your data is not stored in star or snowflake schemas, you should end up modelling the data in your model to make it look like star or snowflake schemas. You can add new model query subjects to your model (usually in a different namespace) that merge the data from your original tables into pseudo-dimension tables and pseudo fact tables. Then use these to populate your packages. In this way, you can be sure the query engine will be generating accurate, consistent queries whatever items are chosen in a report.

The FM User Guide also has some really good coverage of this. Take a look at page 283 onwards:

https://www.ibm.com/docs/en/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.ug_fm.doc/ug_fm.pdf

Hoping that makes sense!

Cheers!

MF.
Meep!

FerdH4

The only area where you were wrong was, "Here comes the interesting bit".

It was ALL very interesting.  Thank you for taking the time to include all of that massive wall of text because regardless of my own experiences, it was all helpful content.

Thank you.

MFGF

Quote from: FerdH4 on 16 May 2022 01:32:38 PM
The only area where you were wrong was, "Here comes the interesting bit".

It was ALL very interesting.  Thank you for taking the time to include all of that massive wall of text because regardless of my own experiences, it was all helpful content.

Thank you.

Lol - you're welcome :-) This is actually the simplified version - we didn't touch on what else you need to do if your fact tables are held at different levels of granularity (eg sales at the date level and sales targets at the quarter level). We can save that for another day when you're having trouble getting to sleep :-)

Cheers!

MF.
Meep!