If you are unable to create a new account, please email support@bspsoftware.com

 

time-base partioned cube - a crazy solution to reduce time elaboration

Started by renba, 13 Nov 2011 08:54:24 AM

Previous topic - Next topic

renba

Hello everybody,
I'm trying to find a way to reduce the total time to create a time-base partioned cube.
Anyone can suggest a solution about the following crazy idea:
If i have a model that builts a time-based partioned cube on 12 months, as you know, I'll find at the end of the job 14 files  (12 childs cubes, the main cube and 1 vcd) .
This job takes a lot of time and I'd like to reduce it.
I'would like to create 12 cubes (one for any month) by 12 parallel processes and to find a way to built the 13th main cube by a separate process (about the vcd there's no problem).
If I found the way I could say "Bingoo" because I can reduce a job from 24 hours in 2 hours.

Maybe someone out there has the solution.
thanks

AussiePete2011

Hi there,

24 hours sounds a long time and I think what you need to do is look deeper at the problem.
I'd recommend performing an IO test
Assuming you are building locally test on the hard drive where the cube is building for IO performance.
If you are building over a network drive, then you need to try building locally to see if the time reduces.
Also look at the actual SQL being presented to the database.  Look at the SQL and see if any indexes are being used.  If not then create indexes on the tables that are being used and retest.
How much data is being called into the cube?
In looking at the data, 90% will be from a fact table so then work out how the data is distributed in the dimensions.  E.g. 70% is NSW, 30, VIC, rest is other states etc.

Changing the Partition strategy may not be the best solution if there is no real understanding of the hardware, SQL and data sets.
http://www.ibm.com/developerworks/data/library/cognos/page354.html

Cheers
Peter

nmcdermaid

I agree. Step 1 is to take a look at the log file and see if most of the time is in one of three areas:

1. Data Open (means the database is struggling to perform joins - db optimisation is the trick)
2. Data Read (means data is being transferred from database to temp files - I/O issue)
3. Final cube passes (you might be able to reduce number of passes by messing about with settings)

Step 2 you might want to consider a time based paritioned cube. Put simply this means if your data for the last ten years has not changed, then you don't have to rebuild that data all the time - just recent data. But there are a lot of strings attached to get time based partitioned cubes working.

Lastly you might want to consider a ROLAP solution - this presents a cube to the user but it comes straight out of the database - no cube build required but you will need some serious DBA's and version 8 of Cognos to get it working efficiently.

nmcdermaid

If you have a time partitioned cube, why are you building all periods at once? The advantage of a time partitioned cube is that you can just build the time period that has changed, i.e. June 2012, then slot that cube right into the existing list of cubes that you've already built.

You do need to build all cubes of there are any changed categories, dimensions or measures though.

You could concievably build all 12 cubes in parallel with 12 instances of transformer whose data is individually limited to the period required. However practically speaking you probably wouldn't want more than 4 or 5 instances running at once though.