If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

[FM] What is a loop Join?

Started by dxwallace, 31 May 2007 09:26:18 AM

Previous topic - Next topic

dxwallace

Can someone explain to me the concept of a Loop Join and if possible, how does a loop join cause double counting in some situations.

thanks

goose

A loop join also reffered to as nested loop join is a method of returning data from two tables. Bascially one table is read first, then for each record read a check is made to see if there is a matching record in the second table.

Double counting can occur when both tables are not of the same granularity - best explained with an example:

- Lets say table one has a record of all sales made per product per location

- Let say table two has a record of all sales made per product per location per salesman

- Now we report off both tables joining them on product_id and location_id

The sales value would be inflated because table two has data for salesman also, for evey unique product_id, location_id there caould be more than on record for the same in table two because that also has salesman

You can find more info on joins here http://en.wikipedia.org/wiki/Join_(SQL) Its definately worth your while to get a thorough understanding of data warehouse design especially if you gonna be doing FM modelling, check out Ralph Kimball books on amazon.

Hope this helps

dxwallace


blom0344

Though the offered example is very clear, it is (my 2 cents) not really the answer to the question. The concept of nested loop join is how the database is going to resolve the joining of 2 tables. The more important point  is the cardinality. In the offered example 2 fact tables are joined ('1' <--> 'many').
The whole principle is to avoid such a scenario. A fact table should only be joined to dimensional tables.
Multi-fact queries are resolved in a Cognos specific way (when the modelling is proper) and then Cognos will detect it should first execute the partial fact-queries and create a full-outer join over the common dimensions. This will eliminate the overcounting at the '1'-end of the cardinality.

The 'old' solution is to count the number of occurences at the 'many' end and use this to correct the sales value by dividing by the counter


dxwallace

Very Good Input to my original question, but help me go a little further and see if I have my answer.

I got a "very" good understand of the issues you were talking by listening (over-and-over) to one of the on-demand seminars on Modeling. This is what I learned.

In the seminar example they gave the example for two fact objects tied to two dimension objects and a stitched query is created with a full outer join. I think I got everything she was saying.

I guess my question was, is that what is called a loop join, or does that avoid a loop join, or more importantly if you model correctly, does a loop join become irrelevant and automatically prevented.

So I guess the real question is, What defines a Loop join, as it relates to Modeling, assuming proper knowledge of how to deal with multi-fact, and multifact-multi grain queries properly?

.....you two cents was appreciated

goose

#5
Very true blom0344 it was the simplest example I could think of to illistrate how double counting would occur. In real life you would not be wanting to join two fact tables.

dxwallace to further anwser your question:

- A loop join is a database mechanism to join two tables, by avoiding a loop join you will not avoid double counting.

- The join mechanism is chosen at runtime by the database query engine, various join mechanisms are available the database will chose will one that will that a) return the complete result the fastest, or b) if you so choose return the some of results quickly. These settings are available in RS under Execution Optimization. Databases have functionality for you to view the explain plan where you can see excatly how your query will get processed. You can also provides hints in sql to force the database query engine to use a specific join type but thats generally not applicable as you then have to write custome sql.

- The examples in the docs shows how cognos joins two fact queries that have common dimensions corrently so you can avaoid double counting and to make sure you include all records from both fact tables. The stiched query is used because a full outer join is employed to make sure all relevent records from both fact tables are inlcuded and avoid double counting.

- A full outer join alone wont ensure double counting wont occur, depending on your data you may also need to define determinants

The best plan of action would be for you to setup some examples and play around in FM.     

blom0344

Just a little bit about the original question on ' loop join'. I think that this actually refers to the fact that ambiguous query paths may be involved. (which really is something altogether different than the 'nested loop join' concept). Ambiguous paths are dealt with by the Cognos modelling solution of reducing everything into stars.

like:      DimX
         /           \
factA                factB
         \            /
             DimY


leads to 2 stars:


                 factA                              factB
               /         \                           /        \
        DimX         DimY           DimXalias    DimYalias

goose

You are 100% correct, this a term used by cognos. Logically you get inner joins and outer joins in their various formats, implementation wise you get nested loop joins, hash joins, merge joins ect etc.

I think dxwallace your best bet for now is to focus on the "how" i.e. how to model correctly in FM to get the results you want. So to start check out the different types of cardinately available for relationships, look at determinants and dimensional modelling & check cognos's best practise guide for dealing with multi-fact multi-grain scenarios in the docs.