If you are unable to create a new account, please email support@bspsoftware.com

 

DM Fact Build process hanging at "Checking Fact Data Existence"

Started by LEO, 21 Jan 2011 12:55:10 AM

Previous topic - Next topic

LEO

I have a jobstream with a fact build in it.
The fact build has 1 datasource to get data from SOURCE db and 6 dimension lookup sources.

All the above datasources are getting process/executed successfully.
All queries are returning values as well.

Its not proceeding after that.

The log files doesnot show any progress in execution.
It does not proceed after the message "Checking Fact Data existence".

It's not clear how to check its progress or debug the issue.

can anybody help in some ideas to check or resolve the issue?

MFGF

If it's getting as far as "Checking Fact Data Existence", this means it has built and cached the lookups in memory, and is starting the process of data acquisition/delivery.  A couple of thoughts:  1. Do you have merging of duplicate rows enabled in the properties of the build?  If so, this means the build will read all rows into memory (utilising a hash table), merging as it goes, then deliver all the mreged rows at the end of this process.  2. Are any of your dimensional reference objects defined as hierarchies, and if so, do any of the dimensional elements in the Transformation Model have the Aggregate checkbox selected?  Again, this will force the use of a hash table as with merging.

I'm just wondering if the build is busy reading rows into memory when you think it has stopped?  What logging options have you enabled?  Try turning on Process, Detail, SQL and Internal logging when you run the build.  You could also set the input and output logging frequency to smaller values so you can see info on how data is being read and/or written.

Cheers!

MF.
Meep!

LEO

Thanks MF  :)

Fyi...
1. Merging of dupilcates is not enabled. The option "Allow records with duplicate keys" is selected.
2. We have lookups based on the dimension tbales rather than any hierarchies. The dimension elements are not being aggregated in the build.
The process is running for around 12-14 hours still not getting completed.

I need to test with the input/output logging frequency option as well.
I have a doubt if the dimension breaks help in someway.

regards,
leo

MFGF

Hi,

Dimension breaking is only appropriate when a hash table is being used by the build (ie you are merging or aggregating) and there is not enough RAM to cache everything in one go.

I'd strongly suspect an issue with either the collection of input rows or the delivery of output rows - setting the logging frequency to show (say) every 10 rows in and out should give you a fair idea pretty quickly whether either of these is a bottleneck.

Let us know what you find!

MF.
Meep!

rveeja

Hi,

Is your target table empty?
When the last time was was your index of your target table rebuild?

For debugging purpose:  (First Eliminate and ensure you have no ETL build related issues)
Step 1: Try to Deliver a small set (100) records to your target (preferably to an empty table with no constraints).
Step 2: Try delivering the facts to an empty temporary table that has no constraints nor any indexes.

Thank You

LEO

Hi MF,

          I tried with the option suggested by u...but to my surprise  i didnot get anything else after the "Checking Fact Data Existence" message.

         I had set the output frequency to 50 but nothing appears to be proceeding...

any other suggestion to check for the process.

thanks,
leo

MFGF

What logging options do you have enabled?  I think you need 'Detail' checked in order to see the collection and delivery messages.  It's also worth enabling SQL logging to see if the "Select (1) from <target table>" query is getting issues to check for existing fact rows.

Cheers!

MF.
Meep!

LEO

Hi,

All the available logging options are enabled before executing the job. Even the SQL logging also.
But nothing is printing in log after the message "Checking Fact Data Existence".

Anything to be done on db side?

thank you,
leo

gagnonp

Is this issue resolved and how?

We have the same issue here. The DM log hangs for three hours on 'Checking fact data existence'.

Our target table is empty, and the delivery type is Truncate.

souvikbanerjee

A request to all members, please post resolution if a problem is resolved.
I am facing a similar problem, "Checking Fact Data Existance" is taking a long time, but how we can minimize it? Do we have any control over this step? can we tune it for performance?

MFGF

I'm out of ideas, sorry. You should log it with IBM to see if it's a known issue and whether any solutions are known.

MF.
Meep!

S2000

I've just resolved a similar problem

Try reducing the row counts of the source builds.
Take them down to a few thousand rows and run the job – does it succeed?
If it does then it points at some sort of memory or temp space problem.

Increase the row counts until it hangs – and enable all the logging options.
Look at the log files of the runs before it hangs and the run where it hangs.

I would guess you'll see one of the temp space values increasing up to the point where it hangs.
The problem was caused by the space required to locally cache all the rows to match with the dim data.
See if you can reduce the rows you import.
Perhaps split it into two builds, or see if you can ignore any data over a certain age?


If the build doesn't work even for small volumes of data, the problem is probably caused by the caching of the dimension data.
See if you can reduce the size of your dims, or perhaps derive the Skey by a join in the datasource rather than by using the fact lookup.
Although this seems to defeat the purpose of DM, it sometimes works if the database is significantly more powerful than the DM server, or if one of the dims is unfeasibly large.