If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Fact And Dimension related - Query Response very slow

Started by rpvs, 14 Feb 2012 01:24:34 PM

Previous topic - Next topic

rpvs

Hi Gurus,
We have dimension table and fact table related properly (1->n) with Domain name and Surrogate id's. The response is very slow when the data in the dimension table is relatively huge compared with other dimension tables.

Example: Sales Fact and Parts Dimension. Both are related to each other like below

Sales fact.domain_name = parts.domain_name and
sales fact.parts_suid        = parts.suid.           

My thoughts are the response is slow because domain name is character and probably need to have a master table for Domain name and have surrogate id's for domain in all master and fact tables.

Is this the right approach ?? Will this improve efficiency of queries ?

I would like to know your thoughts on this.

Thanks in advance

rpvs

Forgot to add to the post. Both tables are indexed by Domain name and the surrogate id's.

blom0344

They only way to know for sure is to examine execution plans. Perhaps selectivity on the larger dimension table is very poor due to missing indexes on that table?  The parts table may hold classification fields that are used for selectivity (= applying filters)
Join execution with integers will be faster than with character fields , but normally not dramatic. Unless of course these are defined as very 'wide' character fields

rpvs

Thanks for the reply. Both the fact and dimension table are indexed by domain name and surrogate id. The domain name char field is 32 and can reduce it further to 15.
Would this help improve the query performance ??

blom0344

Quite possibly.  My remark about indexes was aimed at other columns involved with selecting data (i.e filters), not those involved with joins (pretty logical to have those indexed) Reducing field size will help though the question will be whether thisis the bottleneck..

cognostechie

Quote from: rpvs on 15 Feb 2012 07:47:21 AM
Thanks for the reply. Both the fact and dimension table are indexed by domain name and surrogate id. The domain name char field is 32 and can reduce it further to 15.
Would this help improve the query performance ??

Is that a composite index or seperate index on surrogate ids and domain name?

rpvs

It is a composite index with domain name and surrogate id as fields in the index. Also am having a tough time getting the number of characters reduced in the domain name field since it's one of those fields that get created by the system when a table is created and is protected so that changes cannot be made.

Would having  a seperate dimension table for domain name and having SUID in facts and dimension for the domain name help ?? That way the composite index can consist of domain suid and parts suid with both being integers. It would take a lot of work to get this in every dimension and fact and hence am wondering if it's going to be worth the effort and if it's going to dramatically improve performance.


cognostechie

The system assigned field that cannot be changed is probably GUID which means it would be the primary key in one table but always character field in the other because a GUID valus cannot be repeated anywhere in the database. Not sure but I think making the character as part of the index and joining to the GUID primary might make it slow. Better check with some DBA.

You are right that integer to integer joins are faster but not sure compared to GUID to Character join.

cognostechie

Here it is -

http://www.sql-server-performance.com/2005/guid-performance

So if my guess is correct (your system assigned field is a GUID) then this test shows performance improvement by 27 time during insert so it should actually be better than that during read time
if you converted those fields into integer and used integer index.



cognostechie

Also to check if it is GUID or not, you can go to database and check the data type of the firld. If it says 'uniqueidentifier' then it is GUID. Another way to check is the data itself. If the values are like this -

{YHY9-YHGT-IIUY-YHT6}  this it is GUID. The values will include the curly brackets.

On the other table, this field will be character where this is a foreign key and the character field
will also include curly brackets.

If you decide to change the fields to integer, let me know how it goes.

jive

Hi Rvs,
One other thing, is to have a key for each row and build unique index on it. With Oracle we using sequence type to get a unique number for each row in the fact table and in the dimension. That key is the one we transfert for the dimension to the fact table. So we can add referential intergrity, and link between fact-dimension is only on one key and a unique index.  Fast to search

Cheers

rpvs

Many thanks to all the people and the response to my question. I was pulled off for something else and was not able to look into this all day. Will try out and let you know the details.

Thanks!!

rpvs

I had promised to get back after trying out the suggestion. It was a major change since i had to add field, make changes to ETL and redefine relationships. It took me the weekend and till now along with other things and after completing I AM TOTALY IMPRESSED!!!!. The old methog for 3 years of data for both domain name took about 20 Minutes and with the new design it takes only seconds.

Thanks to all. I have some more things to iron out before releasing the package and letting our users know, but THIS CHANGE IS IMPRESSIVE!!!!!!!!!!!!!!!!!!!!!!!!!!!.

:)

cognostechie

What change did you exactly do? Created numeric columns and indexed them or created unique surrogate keys and indexed them?