If you are unable to create a new account, please email support@bspsoftware.com

 

Performance Issue with "Checking Existing Dimesion Table Rows" in a Dim Build

Started by deadsea, 29 Mar 2016 01:39:05 PM

Previous topic - Next topic

deadsea

Hi All:

I am trying to resolve a performance problem with an existing build which I have inherited.  Scenario is as follows:

1) There is a dimension build that consumes source transactional data with account numbers being the business key and various attributes. In this dimension build we are also generating surrogate keys based on the Account Number business key. This delivers to dimension DIM_ACCT_NUM. Row count for this dimension is about 30 Million rows.

2) In the Fact build we use the same source transactional data as used in above dim build and use lookup tables with above created dimension DIM_ACCT_DIM (among many other look ups for other dimension). This delivers the FACT_ACCT_NUM containing the surrogate key for Account Numbers. Row count for this fact table is same as # 1 i.e. 30 million rows.

3) Our DB2 database is hosted on ZOS and is shared with other databases and to ensure performance is not degraded for other critical applications, we only have X amount of CPU time per query. Due to this restriction, when the dim build described above runs, the step where it performs "[EXECUTEDSQ - 18:44:41] checking existing Dimension table rows" takes long time to run and as a result the query hits the CPU restriction and the build fails.

We have split the dim build into two parts and it seems to work fine with a smaller dataset, however, when testing with the prod dataset of 30-40 million rows, it still failed because even when the build is split, the step of "[EXECUTEDSQ - 18:44:41] checking existing Dimension table rows" still has to do a full table scan of the dim with 30-40 million rows and therefore it fails.

Only other quick solution I can think of is to get rid of dim build altogether (but then we loose the surrogate keys for the Account Number business key) and just use the business key as it is expected to be unique throughout the fact table. However, then we would be going away from the best practices of having a SKey for business keys :(

Does anyone have any insight as to a possible solution that we can implement, while we plan on redesigning this fact table delivery so it does not require such an expensive look up?

Thank you and I hope I was able to explain clearly without write an essay.

deadsea

In case it makes any difference, we are still on DM 10.1.1.

Thanks.

MFGF

Quote from: deadsea on 29 Mar 2016 01:39:05 PM
Hi All:

I am trying to resolve a performance problem with an existing build which I have inherited.  Scenario is as follows:

1) There is a dimension build that consumes source transactional data with account numbers being the business key and various attributes. In this dimension build we are also generating surrogate keys based on the Account Number business key. This delivers to dimension DIM_ACCT_NUM. Row count for this dimension is about 30 Million rows.

2) In the Fact build we use the same source transactional data as used in above dim build and use lookup tables with above created dimension DIM_ACCT_DIM (among many other look ups for other dimension). This delivers the FACT_ACCT_NUM containing the surrogate key for Account Numbers. Row count for this fact table is same as # 1 i.e. 30 million rows.

3) Our DB2 database is hosted on ZOS and is shared with other databases and to ensure performance is not degraded for other critical applications, we only have X amount of CPU time per query. Due to this restriction, when the dim build described above runs, the step where it performs "[EXECUTEDSQ - 18:44:41] checking existing Dimension table rows" takes long time to run and as a result the query hits the CPU restriction and the build fails.

We have split the dim build into two parts and it seems to work fine with a smaller dataset, however, when testing with the prod dataset of 30-40 million rows, it still failed because even when the build is split, the step of "[EXECUTEDSQ - 18:44:41] checking existing Dimension table rows" still has to do a full table scan of the dim with 30-40 million rows and therefore it fails.

Only other quick solution I can think of is to get rid of dim build altogether (but then we loose the surrogate keys for the Account Number business key) and just use the business key as it is expected to be unique throughout the fact table. However, then we would be going away from the best practices of having a SKey for business keys :(

Does anyone have any insight as to a possible solution that we can implement, while we plan on redesigning this fact table delivery so it does not require such an expensive look up?

Thank you and I hope I was able to explain clearly without write an essay.


Hi,

A dimension build will always check the existing table to see if the incoming rows match anything in there. There's nothing you can do to change this behaviour within a dimension build, unfortunately. However, an alternative strategy would be to use a "fact" build to deliver the dimension rows. In the table delivery properties of a fact build, there is a "Record Identity" checkbox, which effectively adds an internally generated unique integer ID for each row (ie a surrogate key). Is it feasible to collect only the rows that don't already exist, and deliver these using "Append" processing in a fact build, or do you need to re-process the entire 30 million rows each time?

Cheers!

MF.
Meep!

deadsea

Thank you for your response.

I have looked into that but was not sure if I will be able to use it as a lookup field. I will give it some thought/try. And you are correct that we can use the APPEND functionality since the business keys are unique from our source system.

I will update this thread in coming days, if the solution with record identity keys works.

MFGF

Quote from: deadsea on 30 Mar 2016 08:34:15 AM
Thank you for your response.

I have looked into that but was not sure if I will be able to use it as a lookup field. I will give it some thought/try. And you are correct that we can use the APPEND functionality since the business keys are unique from our source system.

I will update this thread in coming days, if the solution with record identity keys works.

Hi,

You can create a lookup based on any table. If you don't have a template for the dimension table you are delivering with the fact build, you can easily create the template based on the table - you just need to set the surrogate column attribute to have a behaviour of surrogate key (plus you will need to define the primary business key attribute too - and associate the surrogate with this). Once you have the template, you can use it as the basis of your lookup.

Cheers!

MF.
Meep!

deadsea

Thanks again!

I will update once I have implemented this possible solution.

deadsea

Suggested solution worked. However, we lost few things that DIM builds provide, but no biggie since I replicated that functionality manually in the FACT build.

Thanks!