Layers, relationships, and star schema

BIinNJ · 27 Jan 2011 02:17:34 PM

Hello everyone,

I have a question about using layers, where to set relationships, and hence creating a star schema. I have searched everywhere and cannot get a clear answer, and cannot even follow the sample models to get to an answer. Here is how I understand Cognos's best practices and then I'll follow up with my questions.

Cognos suggests you create at least 3 layers. The first layer is the import layer, which is a database view of the tables. The data source query subjects should all contain "select * from" so that meta data caching can be used. Relationships and cardinality should be defined in this layer.

The second layer is to model the tables into a star schema. Here, model query subjects should be created, based on the objects in the import layer. Relationships should be created between the model query subjects in order to resolve any reporting traps and model into a star schema. In this layer proper naming conventions, calculations, and filters should be applied to each model query subject. I also guess if you are creating relationships that "override" the relationships in the import layer, then determinats should also be set here.

The third layer consists only of shortcuts to the query subjects in the second layer. The namespaces in the third layer should be created for each star schema in the model. (i.e. 1 namespace per each fact table).

Although doing less work is nice, I want to do it right and have optimal performance, first and foremost. My questions are:

If you are already working with a star schema data warehouse, is it necessary to create the second layer, and define the same relationships again from the import layer?

Secondly, if you have a dimension table that is joined 1..1 to another dimensional table, a snowflake, should this be resolved in the import layer via writing SQL to combine the two tables? This SQL table can be joined back to the fact table. Or should this be resolved by creating a merged query subject in the second layer which references the two tables from the import layer? Then create a relationship between this query subject and the model query subject that was created that contains items based on the fact.

Also, in my third paragraph above is this correct thinking "Relationships between the query subjects should be created between the model query subjects in order to resolve any reporting traps and model into a star schema. "?

Thank you gurus.

jive · 27 Jan 2011 03:06:09 PM

Hi,
About your first question,"already working with a star schema data warehouse" , if the precision of your schema is really good, I ask the same question about the 2ND layer. For me, my two cents tells "if my datawarehouse is build robustly ,if i have referential integrity why did I need to rebuild all of that in FM", by that I mean when you import the table in FM, if the integrity is present you don't have to define the relationship, plus if all the relation between tables are 1-1 n-1 it's a charm to have all that in the first layer. But most of the time people work database not really in star schema and they build it in FM. If you had the chance to had everything skipping the second layer it is not a crime. But you will need a layer for the presentation and the dimensional design.

For your second question,If you have dimension link to dimension why don't make a view directly in the database side and save some query in Cognos8. In the project I work we build ,we have oracle, at least 40 materialised view in the database to be sure to avoid a lot of alias in framework manager. I don't know the database your work with but, try to keep most of the aggregation,view, alias in the db side it's more clearer in FM.

It's not in the good of Cognos to do that, but the good practice are not a recipe to apply blindly all the time, sometimes reading between the lines it's good too.

Regard Jacques

BIinNJ · 27 Jan 2011 03:38:59 PM

Quotewhy don't make a view directly in the database side and save some query in Cognos8.

I agree that it is always best to make backend changes, but let's assume no backend changes could be made. We have to work with what we got.

Quoteskipping the second layer it is not a crime

But should we skip it? Is it OK, from a performance persepctive to recreate the same relatioships that are defined in the import layer? And is it good to do in case something in the future requires a second layer?

QuoteBut most of the time people work database not really in star schema and they build it in FM.

This really is what my 2nd and 3rd questions are around, where is the star schema completed? Import Layer, second layer?

RobsWalker68 · 27 Jan 2011 03:44:03 PM

Hi,

QuoteIf you are already working with a star schema data warehouse, is it necessary to create the second layer, and define the same relationships again from the import layer?

Putting aside the issue of DMR layer then ideally you should end up with 3 layers:

1. Database Layer. Purpose: To act as a foundation for your business layer, to run test reports on and to protect your business model from database changes.

2. Business Layer. Purpose: To build detailed and multi-fact reports from (via shortcuts in the presentation layer) and as a platform for the Dimensional/Customisation Layers.

3. Presentation Layer. Purpose: Isolates users from business layer changes and makes it clear to users which dimensions apply to which fact tables and divide things up into subject areas for easy reference.

The database layer should be left as is untouched to avoid complex SQL and like you correctly stated to allow Cognos to use the metadata in Framework Manager and not retrieve it from the database. Check for facts, attributes and identifier are correct.

Yes, Include the business layer as this is where you create query subjects to create the business oriented naming conventions, calculations, determinents. Perhaps view this in the same way as you would a database view in that it allows for a degree of isolation from changes in the database layer.

In terms of joins
• When building a model subject, Cognos uses only joins in the layers below
• When using a model subject to run a report Cognos uses first the joins in the layer of the model subject itself and then the underlying joins.

IMHO if you have a properly designed Star Schema (accepting some Kimball variances) then you can join at the business layer as your reports will only use that layer. If you have a lot of snowflaking then in order to achieve minimised SQL join at the database layer.

QuoteSecondly, if you have a dimension table that is joined 1..1 to another dimensional table, a snowflake, should this be resolved in the import layer via writing SQL to combine the two tables? This SQL table can be joined back to the fact table. Or should this be resolved by creating a merged query subject in the second layer which references the two tables from the import layer? Then create a relationship between this query subject and the model query subject that was created that contains items based on the fact.

What is the reasoning behind this not being this not being a single dimension in the data warehouse already. Is it a monster dimension or rapidly changing?

QuoteAlso, in my third paragraph above is this correct thinking "Relationships between the query subjects should be created between the model query subjects in order to resolve any reporting traps and model into a star schema. "?

If you have, like in sounds, a correctly modelled star schema then you shouldn't have any traps.

Rgds

Rob

RobsWalker68 · 27 Jan 2011 04:14:57 PM

Hi,

I guess to put it succinctly wth one layer you are presuming or hoping nothing will ever change. Now lets presume, as an example, you have a sales fact table that has a fact called 'Gross Sales'which you have modelled in FM and created a shed load of reports based upon.

Then the finance team a couple of years later have a funny moment and decide it should be called 'Gross Income' instead (as you may guess I'm talking from experience here).

With one layer you are in a situation of updating all your reports. With a second layer you make one change only to the business layer and republish the package.

Rgds

Rob

BIinNJ · 27 Jan 2011 05:50:54 PM

Quote from: RobsWalker68 on 27 Jan 2011 03:44:03 PM
IMHO if you have a properly designed Star Schema (accepting some Kimball variances) then you can join at the business layer as your reports will only use that layer.

When you say join do you mean combining items into one query subject? Hence, the join is taken care of through the relationships defined in the import layer? Or do you mean creating a model query subject that behaves as a dimension and using the "Create Relationship" function between this and a model query subject based on the face table.

Quote from: RobsWalker68 on 27 Jan 2011 03:44:03 PM
If you have a lot of snowflaking then in order to achieve minimised SQL join at the database layer.

So are you saying to modify the SQL in the import layer? (do a join to create one "table") This would prohibit Cognos from using metdata caching then.

If I have 3 tables D1, D2, and F where D1 and D2 are dimension tables, and F is the fact table. D1 and D2 have a 1..1 relationship and F is related to D1 1..n. Would you join F to D1 and then join D1 to D2, then in the second layer create a combined model query subject composed of D1 and D2 and join this to a model query subject that is derived from the F table? (using the "Create Relationship" functionality)

Or would you write SQL in the import layer that combines D1 and D2 take that resulting query subject and join to the fact table.

Another question from your post below, how do shortcuts guard against future name changes? The item in the second layer is called "Gross Revenue " it needs to be changed to "Gross Income". Well if you chngne the name in the second layer all reports that are looking for "Gross Revenue" query item fail. The shortcuts update automatically, but reports still fail.

RobsWalker68 · 28 Jan 2011 05:49:22 AM

Hi,

I guess the whole thrust of what I'm saying is use a multiple layer approach to add a degree of isolation from the database layer, so if the structure, naming convention or database vendor changes you are not unduly impacted and can remap with some degree of elegance.

In terms of where to place the joins i.e joining tables not merging them then I have generally made the joins in the business layer and not the db layer simply because on a Star Schema I want to enforce "as view". Yes I would create a model query subject that behaves as a dimension and join to Fact etc . Having said that Cognos training materials advocate joining tables in the db layer so you can utilise minimised SQL. I guess you will need to take a view on that.

Apologies as perhaps my wording was unclear as I wasn't suggesting modifying the SQL in the import layer to handle the snowflake. It was more a question of what was the reasoning behind having the snowflake and why as it is 1:1 you hadn't combined it in the ETL process to create a single dimension? How you handle it depends on whether you wish to minimise SQL or merge it so it acts always as a single unified dimension. If you wish it to remain sql minimised then make the table joins at the db layer and then create a query subject in the business layer otherwise if you want it to act as a unified dimension then merge it in the business layer and make the necessary joins to the fact table.

Your correct short cuts won't mitigate future name changes. Again my wording, I was meaning to refer to the business layer i.e. if a column name changes in the base data source tables then you can easily accomodate this in the business layer without immediately impacting your reports

Rgds

Rob

MFGF · 28 Jan 2011 06:59:13 AM

Quote from: RobsWalker68 on 28 Jan 2011 05:49:22 AM
Hi,

I guess the whole thrust of what I'm saying is use a multiple layer approach to add a degree of isolation from the database layer, so if the structure, naming convention or database vendor changes you are not unduly impacted and can remap with some degree of elegance.

In terms of where to place the joins i.e joining tables not merging them then I have generally made the joins in the business layer and not the db layer simply because on a Star Schema I want to enforce "as view". Yes I would create a model query subject that behaves as a dimension and join to Fact etc . Having said that Cognos training materials advocate joining tables in the db layer so you can utilise minimised SQL. I guess you will need to take a view on that.

Apologies as perhaps my wording was unclear as I wasn't suggesting modifying the SQL in the import layer to handle the snowflake. It was more a question of what was the reasoning behind having the snowflake and why as it is 1:1 you hadn't combined it in the ETL process to create a single dimension? How you handle it depends on whether you wish to minimise SQL or merge it so it acts always as a single unified dimension. If you wish it to remain sql minimised then make the table joins at the db layer and then create a query subject in the business layer otherwise if you want it to act as a unified dimension then merge it in the business layer and make the necessary joins to the fact table.

Your correct short cuts won't mitigate future name changes. Again my wording, I was meaning to refer to the business layer i.e. if a column name changes in the base data source tables then you can easily accomodate this in the business layer without immediately impacting your reports.

Just to throw my own hat into the ring (ha! bet you're all thinking "oh no - not again! Please don't bore us again!"), my approach is similar, but slightly different to the one Rob suggests.

I too generally use a 3-tier modelling approach as follows:

Bottom tier - Foundation Objects View.
This contains all the underlying data source query subjects and relationships resulting from the database metadata import process. I check and correct all Query Item usage properties and relationships between query subjects, then begin to look for reporting traps. I correct these as necessary in the Foundation Objects View tier, essentially ending up (usually) with snowflake schemas. I also chech and correct any determinants.

Middle tier - Reporting Objects View.
This contains model query subjects based on the stuff in the Foundation Objects View. I collapse each snowflake dimension chain of tables from the Foundtion Objects View into a single separate (star) dimension query subject, and each "fact" query subject from the foundation Objects View into a separate (star) fact query subject. I rename the items in this layer to reflect the required business terminology, and add calculations etc as required, but I add no relationships at this level, in order to allow best chance of query minimization at runtime. I end up with what look like star schama fact and dimension tables, but with no direct relationships between them. These give the added benefit of isolating the reports from the underlying database structure, and any possible changes which may occur in future.

Top tier - Business View.
This usually comprises shortcuts back to the query subjects in the Reporting Objects tier. Normally, these take the form of Star Schema Groupings, with the appropriate fact and dimension shortcuts grouped together into a namespace suitable for deployment into a specific package.

The only real difference between Rob's approach and this one is that I do more work in the bottom layer to achieve accurate, consistent results, then add no relationships in the middle tier to help with query efficiency at runtime. Having said that, I'm not going to claim that my approach is any better or worse than the one Rob uses, and indeed I have used Rob's approach on some occasions where it was appropriate.

How you model really comes down to what are the main goals and restrictions you are facing. However you decide to proceed, the overriding priority must be first to model for accurate, predictable results, then for efficiency, and finally for intuitiveness/usability. If you put efficiency before accuracy, you will end up with very fast reports giving you incorrect results, and if you put usability first you will end up with easy-to-build reports that return incorrect results slowly :-)

Ok - boring gibberish over. Time to get back to insulting people without them noticing ;) :)

MF. Take no notice of that Rob fella. He's a big, clever know-it-all and he makes me look stupid all the time. But I'm not bitter. No, not at all!

Lynn · 28 Jan 2011 08:07:26 AM

I think there are some very good concepts and considerations already covered, so I'll simply add an interesting literary note:

"Layers and relationships and star schemas, oh my!"
This is what Dorothy and the scarecrow were worried about as they traveled down the yellow brick road.

In the movie version they changed this to "Lions and tigers and bears, oh my!"

Also in the book, the flying monkeys were all Cognos developers and the Wicked Witch of the West was a business user.

;D

MF, my sole purpose here is to make everyone else look good. You can take comfort there is always someone lower down the ladder and not take it out on Rob...sometimes I read what other smart people say and I think I should just retire!

BIinNJ · 28 Jan 2011 09:40:50 AM

Thanks these are great posts. Question about conformed dimensions then:

Quote
Middle tier - Reporting Objects View.
This contains model query subjects based on the stuff in the Foundation Objects View. I collapse each snowflake dimension chain of tables from the Foundtion Objects View into a single separate (star) dimension query subject, and each "fact" query subject from the foundation Objects View into a separate (star) fact query subject. I rename the items in this layer to reflect the required business terminology, and add calculations etc as required, but I add no relationships at this level, in order to allow best chance of query minimization at runtime. I end up with what look like star schama fact and dimension tables, but with no direct relationships between them. These give the added benefit of isolating the reports from the underlying database structure, and any possible changes which may occur in future.

For conformed dimensions like Time and Product, do you simply create one model query subject for each of these dimensions in this layer, then in the Business View, create a namespace for Fact1 and shortcuts pointing to the star schema model subjects in Reporting Objects View. Create a second namespace for Fact2 and shortcuts pointing to model subjects, reusing the same Time and Product model query subject in the middle layer?

Cognos will fiugre out the path to take depending on the namespace that is used in the Busines View?

Lynn · 28 Jan 2011 09:47:29 AM

Quote from: BIinNJ on 28 Jan 2011 09:40:50 AM
Thanks these are great posts. Question about conformed dimensions then:

For conformed dimensions like Time and Product, do you simply create one model query subject for each of these dimensions in this layer, then in the Business View, create a namespace for Fact1 and shortcuts pointing to the star schema model subjects in Reporting Objects View. Create a second namespace for Fact2 and shortcuts pointing to model subjects, reusing the same Time and Product model query subject in the middle layer?

Cognos will fiugre out the path to take depending on the namespace that is used in the Busines View?

Yes, I believe that is correct. You can use Star Schema Grouping to do that easily. You haven't mentioned any multi-fact querying and there are additional considerations if you have requirements for that. Not sure if it is relevant to your situation or not but thought I'd mention it in case.

MFGF · 28 Jan 2011 09:56:09 AM

Quote from: BIinNJ on 28 Jan 2011 09:40:50 AM
Thanks these are great posts. Question about conformed dimensions then:

For conformed dimensions like Time and Product, do you simply create one model query subject for each of these dimensions in this layer, then in the Business View, create a namespace for Fact1 and shortcuts pointing to the star schema model subjects in Reporting Objects View. Create a second namespace for Fact2 and shortcuts pointing to model subjects, reusing the same Time and Product model query subject in the middle layer?

Cognos will fiugre out the path to take depending on the namespace that is used in the Busines View?

Yes - exactly right. You have just one instance of a Time query subject and one instance of a Product query subject in the Reporting Object View, then point back to these via shortcuts in different namespaces in the Business View. This holds true regardless of whether you tend towards the approach Rob's described or the approach I described. :-)

MF. Lynn, thanks for the kind words. It doesn't help, though, when you're up against someone who is wise, with manly good looks, and a middle name of Slrgh. <sigh>

BIinNJ · 28 Jan 2011 10:50:11 AM

Excellent! So does the "Create Star Schema Grouping" function in FM have any use? It just creates shortcuts to objects directly joined to item you are clicking on. I don't see any function to it, except maybe aesthetics?

This has been very helpful. ;D

Lynn · 28 Jan 2011 10:59:10 AM

Quote from: BIinNJ on 28 Jan 2011 10:50:11 AM
Excellent! So does the "Create Star Schema Grouping" function in FM have any use? It just creates shortcuts to objects directly joined to item you are clicking on. I don't see any function to it, except maybe aesthetics?

This has been very helpful. ;D

It is just a tool to make it a little quicker to define by doing all the individual steps in one shot. Click, click, done...

MFGF · 28 Jan 2011 11:05:49 AM

Hi,

It's VERY handy if you use Rob's approach and define relationships in the middle tier - all you need to do is select the relevant "fact" query subject, right-click, and 'Create Star Schema Grouping', and it will set up the namespace and shortcuts for the Fact and all its related Dimension query subjects.

If you use the approach I advised, it's less handy, as you need to select the Fact and all the relevant Dimension query subjects manually, then use the 'Create Star Schema Grouping' option.

Looks like that's another point to Rob, then! :D Gaaaahhh! Rob just keeps looking better and better!!!

BIinNJ · 28 Jan 2011 11:09:14 AM

But it only looks useful where the relationships are defined. If you define relationships in the first layer and rename in the second then you couldn't use this functionality to build the 3rd layer.

If you renamed in the first layer where the relationships are defined then you could use this function. But then you are not following Cognos's best practices.

BIinNJ · 28 Jan 2011 11:10:41 AM

Quote from: MFGF on 28 Jan 2011 11:05:49 AM
Hi,

It's VERY handy if you use Rob's approach and define relationships in the middle tier - all you need to do is select the relevant "fact" query subject, right-click, and 'Create Star Schema Grouping', and it will set up the namespace and shortcuts for the Fact and all its related Dimension query subjects.

If you use the approach I advised, it's less handy, as you need to select the Fact and all the relevant Dimension query subjects manually, then use the 'Create Star Schema Grouping' option.

Looks like that's another point to Rob, then! :D Gaaaahhh! Rob just keeps looking better and better!!!

Yep, that makes sense to me.

BSP Software	Resources	About Us
MetaManager	BSP Software Training	BSP Software
Integrated Control Suite	YouTube Channel	Micro Strategies Inc
Security Migration		IBM Cognos
Integrated Management Suite

If you are unable to create a new account, please email support@bspsoftware.com

COGNOiSe.com - The IBM Cognos Community

News:

Layers, relationships, and star schema

BIinNJ

jive

BIinNJ

RobsWalker68

RobsWalker68

BIinNJ

RobsWalker68

MFGF

Lynn

BIinNJ

Lynn

MFGF

BIinNJ

Lynn

MFGF

BIinNJ

BIinNJ