If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Cut-down Performance on Oracle 10g

Started by corey_hallisey, 27 Jan 2011 02:51:40 PM

Previous topic - Next topic

corey_hallisey

Hi all.  This is my first post here.

We are in the middle of a conversion, moving our 1Terrabyte of Cognos Planning applications from SQL2000 to Oracle 10g.

We have hit a significant performance show-stopper.  Our cut-downs on our big models have gone from about 8-10 minutes on SQLServer 2000 to 3 hours on Oracle 10g.   I've been working with Cognos Support for over a week on this now, but so far no luck on a solution.

Does anyone have any pointers / best practices on optimizing performance of cut-downs against Oracle?   We updated our ODAC (part of the Oracle client) to the recommended 10.2.0.2.21 version, but it has made no difference.

We have a 48-CPU job server farm, and our big models have over 1500 e-list items with sparse cubes in the millions of cells in some cases.  For us, cut-downs are required for end-user performance because our models are so big.

Thanks in advance for any pointers.

Rutulian

Hi Corey,

Welcome to the forums - this definitely sounds like a nontrivial one to deal with.  But fun!

Is it just cut-downs that have become slower, or all jobs across the board?  There are some options we can call the Oracle client with to improve Publish performance, but I'm not sure they affect cutdowns.

Which version of Contributor are you using?
Are you seeing a lot of LOCK_DENIED entries in the job monitoring screen?
Has anything else changed apart from the DB? (ie new job servers provisioned, fixpack upgrade)

If you've not collected one already it would be a good idea to run the same cutdown on both databases (easiest done by causing a cutdown to be needed, then deploy out, then GTP, then restore deployment to other DB, then GTP) and run Job Doctor reports afterward.  This is a Contributor Macro step which produces a report, I'll gladly take a look but Support will definitely want to see this.

If you're friendly with your DBA, it might be worth asking if they're seeing anything unusual on their side or have any ideas about optimising the traffic they are seeing.  We don't have anything yet to point to the DB as a root cause, so if there's a project cost to using DBA time might want to wait while we/support try to analyse.

Cheers,
Alexis

corey_hallisey

Thanks for your response Alexis.

To answer your questions:
1.  Only the cut-downs performances are affected.  Nothing else has seen dramatic slowdowns like the cut-downs. Reconciles are about the same as before.
2.  Version of Contributor:  8.4 FP2 (no, we're not at 8.4.1 yet - we upgraded just before it came out).
3.  I'll take a look at the LOCK_DENIED.  If we're seeing a lot, what should we do?
4.  No, nothing else except the DB has changed in our architecture.

I'll set up a macro and then take a look at the Job Doctor report.  Support surprisingly hasn't asked for it.

My other theory:  we when imported all our applications from SQLServer to Oracle, one recommendation that our Cognos Consultants gave us speed up the imports was to rename the users.txt file to users2.txt, so on the import, the users wouldn't be picked up.  We then applied the rights after the fact.  Apparently, this is a "well-known workaround" for slow imports according to our consultants and also to Cognos Support.

Removing the users.txt file sped up the imports dramatically (e.g. from 13h to 2h).    However, we found that when we imported our models "without the users", the System Links on the Get Data screen were no longer visible to the end-users.  And we couldn't get them to re-appear by any means.  It's stumped support as well.  If we import with the users.txt file intact, the System Links show up.   I'm wondering if the 2 problems are related.   Because cut-downs are preparing end-users specific data blocks, if we remove the users information from the models, maybe it's causing the cut-downs to go crazy.

Marnie @ Support is trying to engage some of her expert Oracle resources to help.  But hasn't had much luck doing so yet.

Rutulian

Hi Corey,

Interesting - I'd been planning to try some of 8.4's Job Polling config changes if you were on a pre-8.4 system, but that's already ruled out.  There are some changes to Job Server .xml that you can quite happily apply to older versions too.

The key question with your results at the moment is what sort of cut-down performance do you see on the model imported with users?  I'd expect to see hiccups with System Links, any saved user model views, and user annotations, but I think the data blocks produced by cutdown are e.list item specific (which often shakes out to be user specific when you look at rights) rather than being directly associated with the user.

I'll happily take a look a look at the job doc reports in case I can spot anything, but with 8.4FP2 in the mix I haven't got a prime suspect at the moment.  You're in great hands with Marnie, say hi from me!

As an aside, can you reconfigure GetData when it's been 'lost' after import?  I'm on an 8.3 here and I know the RCP client changes things a lot, but in my version GetData is configured for specific users in Production->Extensions->Client Extensions.  I'd guess that re-entering valid groups might help, if it's not something you've already given a go.

Alexis

JaromirSeps

Hi, maybe this does not solve the problem directly, but ..

Did you try to disable the cut-down completely?

Maybe try that and measure the effect in model opening time for the end user.

If it is just a couple of seconds, I would not bother with cut-down ..

(btw. we had similar trouble with cut-down on Oracle, but ours was caused by the blob size being limited. and it was tracable in the logs ..)

Jaromir

corey_hallisey

Hi Jaromir & Rutulian.  Thanks for your replies to my post.

Our current status:
1.  We have finished the conversion of live models to Oracle 10g.  The "disappearing system links" issue was resolved by not renaming the users.txt file.  It took way longer to import them, but it worked.

2.  As for the cut-down performance, things seem to be getting worse.  After our initial import, the first cut-down on our largest model took about 1.5 hours.  After our business partners did their normal monthly data loads, and actually shortened the e-list a bit, their cut-downs are now taking 4.5 hours!!  Nothing else in the environment has changed at all.  And it's ONLY with the cut-down-models step.  Reconciles and publishes are totally fine and within normal performance benchmarks for us.  Just the cut-downs seem to have gone crazy.

Jaromir, you mentioned you had issues with your "blob size being limited".  What exactly should I be looking for in my Oracle database to see if my blob sizes are being limited?  Is there an Oracle Config that I should be looking at that controls Blob sizes or something?  We are following the Cognos Planning DBA guide for our setups, but I don't remember there being anything specific about BLOB sizing in there.

ykud

I'd look into Oracle performance with BLOB handling and logging.

The main thing Cut-Down does -- it creates a number of new application definitions, better optimized for various users. Each of this definition files is an XML file that is stored as a BLOB in database. I've seen Oracle choking while trying to place a lot of such XMLs in tables at the same time and maintaining own log files of the process.
So try to work with your Oracle DBA to:
1) see is the DB being slow during the process (locks, I\O, logging)
2) play around filegroups allocation to beat up I\O
3) turn of logging for cut-down time

Another question -- are you sure that it's not Contributor Servers that are being slow? Are they 100% CPU during cut-down?

And cut-down model size is affected by dimension order in cubes, as you might already know. If didn't do that already -- put elist as the second dimension always.

- Yuri.

corey_hallisey

Thanks ykud for the info. 

1)  I've had our DBAs monitoring our database during our cut-downs but they say they haven't seen anything weird, but I'll double check to see what they're monitoring.

2)  I'm pretty sure our DBAs have simply created the filegroups as per their "standard practices".  I know we have the LOBs on separate tablespace, but I'm not sure if they can separate out that specific tablespace to it's own filegroups.

3)  By Logging, I assume you mean Oracle's "archive logging".  We don't do archive logging on our Cognos Planning transaction database as we don't need up-to-the-minute recoverability.  We do cold nightly database backups.

4)  the Contributor Job servers are busy when running the cut-downs, but we govern them so that jobs cannot totally max out the servers.  Each job server is configured to use n-1 processes where n=# of CPUs on the box.  the job servers are dedicated to Cognos Planning and nothing else.  They are all physical boxes (total of 40 job server CPUs).

I'll double-check with our Modellers to see if they have the e-list as the 2nd dimension.

Thanks again for the hints.

ykud

#8
Corey, in my experience the 'performance' game goes like this:
1) you note the differences in fast and slow environments (was SQL 2000 -- now it's Oracle, as you say)
2) you monitor the whole system (app servers + network + db servers) during slow periods, using DB tools (focus on IO speed) and performance monitor (via Windows perfmon) for Cognos EP app servers.

I currently have 3 theories of why it can be slow:
- Oracle side configuration. But in this case you should see that EP serves are not fully CPU-loaded -- they should be waiting for Oracle reponses.
- Cognos EP differences in talking to Oracle vs Ms SQL -- in this case you should see that EP servers are fully loaded and Oracle is not overloaded
-  I've had a very weird experience with network configuration in one of my projects -- there was a Cisco network switch, which had an internal proxy filter on Oracle's default port for security. And performance of this filter turned out to a bottle-neck in whole system. Changing oracle's default port helped in that case.  But in this case you'd see both Oracle and Cognos not fully loaded.

And an obvious one: Anything in planningerrorlog files?
Keep us posted about your findings -- it's an interesting case )

corey_hallisey

Latest update - we still have not solved it.

We think we've narrowed it down to the LARGE_OBJECT tablespace.  In our 1Tb database, our LOB tablespace is 96% of the capacity and we have ALL our applications using this big tablespace.

So, we created a 2nd smaller tablespace - 65gb as opposed to 1Tb - and only put 1 application into that tablespace and then ran cut-downs.

The cut-down took about 50minutes as opposed to 6 hours in the Large tablespace.

So we thought we were making some headway.

However, I then asked our testers to run more cut-downs, just to confirm there was no performance degradation over time.

To my disappointment, the performance degraded badly as each new cut-down was done, and the 65Gb LOB tablespace got more and more full as each cut-down was done.  Originally, it was about 10Gb full.  Now it's almost totally full (45Mb left).

So from 50mins, we've now degraded to 3h30mins, same model, and for each cut-down we're basically renaming one object in one d-list and then running the cut-down.

So, it appears the performance degradation is related to growth of the LOB data with each new cut-down.  Wouldn't the previous LOB data get cleared as each new cut-down is done?

We have escalated this issue to Level 3 Support with Cognos.

corey_hallisey

Hi all.   It feels like I'm starting to lose my mind with cut-downs.

I'm hearing conflicting reports about cut-downs.  Some Cognos folks are telling me that cut-downs after v7.3 are pretty much a waste of time because of the usual horsepower of PC's and network bandwidth these days, the benefit of cut-downs is negligible.  And in v10.x they're gone entirely.

The cut-downs also have the nasty side-effect of growing the LOB data LINEARLY with each cut-down.  Old data from previous cut-downs is NOT cleared out!  For example, on the model I've been testing with, the initial data used in the LOB was about 10Gb.  After doing 6 iterations of cut-downs, we managed to fill up the 64Gb tablespace, and the ONLY change we were doing on the models to trigger the cut-downs was a simple re-naming of one item in 1 d-list.  That's a consumption rate of around 10Gb per cut-down!  And we run cut-down GTPs every day because we re-load actuals data into our forecast models every day to make sure our users have the latest and greatest actuals data.

We are now running comparative tests between a version of the model WITH cut-downs, vs WIHTOUT cut-downs to see if end-users actually see any measurable benefit from the cut-downs.  If they don't, we're simply gonna turn cut-downs off and eliminate 2 problems: bad performance on GTPs, and also stops the massive data growth in our system.

The remaining problem is how do I re-shrink the LOB data back down after disabling the cut-downs.  I'll probably have to export/import all my models again into a new tablespace and then drop the old one once all the models are migrated.

Why do successive cut-downs cause the LOB tablespace to grow over time? 
Why isn't old cut-down data cleared out by later cut-downs?

SomeClown

Quote from: corey_hallisey on 23 Mar 2011 03:43:53 PM
Why do successive cut-downs cause the LOB tablespace to grow over time? 
Why isn't old cut-down data cleared out by later cut-downs?

That sounds like the old Oracle bug that got cleared up somewhere in the Oracle 9.2.xx release  (Oracle didn't properly reclaim LOB space).  Don't know if that's going on here (a DBA parm somewhere?).   You may want to go back to the DBAs and see if that space is getting reclaimed.

You would be able to see if it's app vs oracle by running rowcounts on the cutdown table (something like  nodesetguidmodel).  If the count grows with each successive cutdown, then it's an app (Cognos) issue.  If the rowcount doesn't change, then it looks like Oracle isn't reclaiming the space

ykud

Corey, you can run EP Model Size Reader tool (located in \bin\ folder) to see XML sizes for global model and cut-down one. It'll show you how effective your cut-down is.

You can estimate where your cut-down processes actually increase size by running script like

select   a.TABLESPACE_NAME,
   a.BYTES bytes_used,
   b.BYTES bytes_free,
   b.largest,
   round(((a.BYTES-b.BYTES)/a.BYTES)*100,2) percent_used
from    
   (
      select    TABLESPACE_NAME,
         sum(BYTES) BYTES
      from    dba_data_files
      group    by TABLESPACE_NAME
   )
   a,
   (
      select    TABLESPACE_NAME,
         sum(BYTES) BYTES ,
         max(BYTES) largest
      from    dba_free_space
      group    by TABLESPACE_NAME
   )
   b
where    a.TABLESPACE_NAME=b.TABLESPACE_NAME
order    by ((a.BYTES-b.BYTES)/a.BYTES) desc

for your LOB tablespace before and after cut-downs. If your space used doesn't increase -- just turn off auto-increment for LOB tablespace and your Oracle speed will be constant with subsequent cut-downs.
Extending tablespaces is a very resource-demanding operation, so it's your speed are no surprise in this case.

corey_hallisey

Thanks ykud for the great info on how to measure size of the models.  And thanks to SomeClown for the hint on the rowcounts.  Never thought of that.

Cognos Support has managed to reproduce the problem on their end, but I'm still waiting to hear exactly what the root cause is.   Not sure if they're going to call this a 'software bug' or something else.


Rutulian

Hi All,

This one's definitely going in my 'list of useful threads'!  SomeClown's post sounds like an issue I remember from the early Oracle 9 days, because of the heavy use Contributor makes of XML BLOBs there are some very specific versions needed on the client side as well as the server.  Might well be that your DBA can manually clean up BLOBs flagged for deletion which are still occupying space.

A few other things have come up in this thread that are interesting.  The following statements are half-remembered once-truths, and are offered as possible workings rather than definite explanations!

I'd definitely say that Cutdowns have their place for one reason: you can get big models working by chopping them up so that no node has the whole thing.  I'm not sure how well that works if you just have A-Tables, I know the client does some resolution of access blocks against a model definition and then loads data, but I don't know for sure if it opens the whole thing and then cuts it down (which would mean size is capped by the underlying model) or if it opens the access block and then the model according to that.

It's true that modern desktops are ever more powerful, but locally I'm seeing a big drive to Citrix-on-virtual desktops in the ever-repeating cycle of thin clients and fat servers, now that most business users only need Office and a web browser to get things done.  Analyst has caused some... discussions. Resource-hungry client apps can really skew the profile, so there are definitely situations where keeping the heavy lifting as server-side as possible is relevant.

If everything's working as it should, I think Contributor stores 3 sets of Cutdowns - one for Production and one for each of the previous 2 GTPs, so if offline clients return with those versions the data can be reconciled instead of being lost.  Every GTP adds a model to the model history, but apart from the most recent 2 GTPs only the core model with no cutdowns applied is kept.

Because of this, growth for the first couple of GTPs is not unexpected, but after the 3rd GTP (populated with production-like data) you should be at a relatively stable BLOB size.

Hope everything's all fixed and blissfully error-free,
Rutulian

corey_hallisey

Thanks Rutulian.

At the end of the day, we did some testing with our end-users WITHOUT cut-downs, and saw no degradation in end-user performance after we turned it off and GTP'd twice.

And since in Cognos Planning 10, cut-downs are gone entirely, it made sense for us to turn it off.

Cognos Support gave me some steps to release the LOB space after we turned off the cut-downs.

1.  Turn off cut-downs. 
2.  Run GTP twice.
3.  In the database, on the schema where you turned off the cut-down, run the following SQL:
Truncate table application_schema_name.NODESETMODEL drop storage; 

That released the space for us and GTPs are back to normal performance now (because they're not doing any cut-downs anymore).  Now my system isn't 1Terrabyte in size anymore either. YAY!

This issue is still with Cognos Development - they're investigating if the Oracle 9 bug has re-appeared in 10.

Thanks everyone for their help and contributions!  In the end it was like:

Patient: It hurts when I use cut-downs.
Doctor:  Then don't use them!

ovo

Back to the question of cut-down.

Cut Down always happens, the Cut-Down models setting simply dictates whether the cutdown happens server side and the models get stored in the database for run-time use.  If Cut-Down is off the client downloads the full models defintion and runs the cutdown itself client side.  The setting for review items is a half-way house so the work is shared.

Cut Down reduces the data going over the Network and also when the feature was introduced server machines were more capable of performing these memory intensive operations that client desktops.  Nowadays networks are faster and client machines typically have fast processors and lots of RAM, so the cost of storing the model definition in the database and the time of running the jobs can outway the client performance improvement.

BTW, only the last three copies of the cut-down models should be stored (in order to support rollback and reconcile against old models when a user has been off-line).

In version 10.1, Cut Down is gone!  It is replaced by the concept of access blocks, which ensures that all apps are always optimised for run-time performance and you do not have any server side or storage costs.  Try it, the results I have seen so far have been fantastic.

JaromirSeps

Hi,
maybe too late, but I'm adding my experience with cut down processes.

We had them turned ON in all of our applications, and we have gone to OFF everywhere. We measured the performance bothsides and there was almost no difference.

However we still use Oracle 9 + planning 8.1.

Jaromir