COGNOiSe.com - The IBM Cognos Community

IBM Cognos 8 Platform => COGNOS 8 => BI Modeling (Transformer) => Topic started by: ComplexQry on 15 Jul 2010 03:08:37 PM

Title: Cube Scalibility
Post by: ComplexQry on 15 Jul 2010 03:08:37 PM
I have a cube that I am concerned with how well it is going to scale.  The requirement is to have 5 years of data.  Each month has ~45 million records.  Do the math, we should be around 2.7 Billion Records.  I anticipate a reasonable amount of categories.  For one month it was around 100,000.  But this number shoudn't grow drastically outside of the time dim categories.

I guess I am looking for opinions of how this should scale once it is a full 5 years of data.  The plan is to incrementally build.

Also there is the potential to partition this cube by 13 geographic regions.  What would the downside be to doing this?
Title: Re: Cube Scalibility
Post by: twlarsen on 20 Jul 2010 12:14:21 PM
Simple solution...don't use cognos for that amount of records ;)

How many dimensions are you going to be using?

Whats the largest cube your organization uses right now?
Title: Re: Cube Scalibility
Post by: ComplexQry on 20 Jul 2010 12:26:14 PM
12 Dimensions.  Many of those dimensions are simple 1 level dimensions.  I am really not concerned about the category counts.  Moreso just the volumn of data.  I would anticipate 500k categories at most.  If not Cognos, then what?  Would TM1 be a better fit for this?
Title: Re: Cube Scalibility
Post by: ComplexQry on 15 Sep 2010 07:23:02 AM
Ok, the project for this effort is kicking off soon...  And I know it will push Powerplay to its limit.  So I need some help with combating a cube that is exceeding 2Gb.  I have read on some posts that this isn't really a hard limit restriction.  But assuming I modify the query to be as efficient as possible, but still hit that 2Gb with ease, what other alternatives are there besides reducing the required 5 years of data?  Again, Category Counts are good, Record Volume is the thorn.
Title: Re: Cube Scalibility
Post by: RobsWalker68 on 17 Sep 2010 10:49:58 AM
Hi

Handling the 2Gb limit per mdc is easy enough to solve you just need to change the MultiFileCubeThreshold setting in the cogtr.xml file to enable creation of multiple local PowerCube files.

However, you are still going to have issues with the volume of data and potentially the category limit which used to be around 2 million, but you will need to check that out further. 

Personally from experience even if you get the cube to build with five years worth of data the performance from a user perspective of a disk integrator such as Transformer is going to be woeful.  You may need to be looking at other products such as Cognos TM1 that could potentially handle that volume of data.

Presumably the 45 million rows per month are transactional records.  Have you looked at pre-summarising in the database first before handing off to Transformer and then using Report Studio to provide access to the detailed transactions?

You could create mini cubes for your geographical regions although you need to consider how you would present a company wide picture if need be.

Kind Regards


Rob


Title: Re: Cube Scalibility
Post by: ComplexQry on 22 Sep 2010 11:21:55 AM
Thanks for the advise.

My current state is that I am using Time-Based Partitioning by Year.  So in this case I have 5 cubes, one for each year.  A full year comes out to be about 1.4GB and about 50k Categories.  So I am very satisfied with the Category Count.  So I will need to conduct some performance testing of using this data across years.  The typical behavior would be "This is my number today, how was it in previous years?"  So I can see heavy analysis in that current year cube, but then referring back to previous years.  I am quite a bit more optimistic today at least.  :)
Title: Re: Cube Scalibility
Post by: nmcdermaid on 30 Sep 2010 09:54:43 PM
It sounds like you have the cube side of it totally in hand.

Assuming you don't have lots of new categories, record volumes only add to build time

There's a document floating around called 'Transformer in production'. Most of which is says:

-have a dedicated cube build server (i.e. seperate to database, and most importantly have its disks seperate to the database)
-have different physical disks for the various temp stages of the cube build
-partition your cubes and only build cubes with changed data (unfortunately that also means if you get a new account category you need to build all cubes)
-pre summarise in the DB where possible