If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Java Garbage collection causing report server slowdowns

Started by jmwhitehead, 10 Sep 2009 06:51:34 AM

Previous topic - Next topic

jmwhitehead

Our current cognis installation is on 2 sun solaris zones running on an M4000 server. The first zone has the gateway and the application tier installed and the second has the content manager and another application tier installed.

In general the cpu usage of the cognos java process on either zone is usually > 1% except when around every hour when garbage collection is occuring, then the cpu usage rises to around 5% and the reports service response becomes slow for the end users who are logging in or navigating around cognos connection. When the garbage collection finishes the performance of the portal is a lot quicker and the cpu usage returns to its sub 1% level.

Has anyone else noticed this type of behaviour?

I've followed the instructions in the cognos java tuning doc to turn on the GC logging which is how I identified the issue but it doesn't offer any advice on how to improve the system with regards to speeding up the GC process or make it happen less frequently.

I did reduce the memory allocated from 1152 to 768 but apart from making the GC more frequent it didn't allievate the service slowdowns, I'm now thinking I should go the other way and make it a lot bigger, perhaps 2048 to see if that reduces the frequency of the GC even if it doesn't remove the slowdowns when it happens.

Also in the GC logs I get a lot of entries like this :-

141161.002: [GC 141161.002: [ParNew (promotion failed): 195665K->195665K(196480K), 0.0049551 secs]141161.007: [CMS (concurrent mode failure): 549611K->550081K(589824K), 1.1084267 secs] 745261K->550081K(786304K), 1.1136433 secs]

when the system is experencing it's higher cpu usage and slower portal performance.

Jonathan

kolonell

Quote
In general the cpu usage of the cognos java process on either zone is usually > 1% except when around every hour when garbage collection is occuring, then the cpu usage rises to around 5% and the reports service response becomes slow for the end users who are logging in or navigating around cognos connection. When the garbage collection finishes the performance of the portal is a lot quicker and the cpu usage returns to its sub 1% level.
that looks like a full garbage collection is occuring ... that almost freezes all other threads in the java process hence the freeze in performance.

Quote
141161.002: [GC 141161.002: [ParNew (promotion failed): 195665K->195665K(196480K), 0.0049551 secs]
141161.007: [CMS (concurrent mode failure): 549611K->550081K(589824K), 1.1084267 secs]
745261K->550081K(786304K), 1.1136433 secs]

The promotion failed-entry means the collector can't put items from new to tenured generation (lack of space, fragmentation, ... ). When that occurs collection becomes full collection.

try increasing the Max heap to 1500M (Xmx) and the initial heap as well (Xms)
How much memory do the zones have for themselves (not shared)?

jmwhitehead

Hi Kolonell,

Thanks for your reply, I had thought I might increase the heap size to something >= 1.5 gb, as its on a production machine I can only tweak it during off hours so I might try that this evening. On question is I know I can change the -Xmx via the cogconfig but there is no parameter entry in my bootstrap_solaris.xml file for the -Xms setting, is this something I should be adding? I notice that it is as a parameter in the cbs_cnfgtest_solaris.xml file.

Both zones have 4gb allocated each.

I have recently become aware of a tool called jvisualvm and using this I can see that the poor performance is almost certainly caused during periods of heavy garbage collection activity.

kolonell

You can just add that as a <param></param> switch. If it is in the cfgtest_solaris.xml file that you can just copy the option from there and set it to 1500 (or something similar)

QuoteI have recently become aware of a tool called jvisualvm and using this I can see that the poor performance is almost certainly caused during periods of heavy garbage collection activity.

If that is the case solving this could be a very tricky one .. what is the memory usage of the Java process when the slow down occurs ?

jmwhitehead

I've tweaked the memory parameter, i did set it to 2048 originally but that seemed to be causing the application tier java process to crash and restart, I've reduced it down to 1536 to see if it stays alive at that setting.

When I first encountered this issue I opened an SR with IBM support and they sent me a link to a doc that showed me how to set up gc logging, one of the parameters in this it says to set is the -Xingc switch, now I didn't know what this was but I've since done quite a bit of reading on java GC and it seems to me on a multi-processor system I shouldn't be setting this or am i wrong on this? (I could very well be as I'm a relative novice at JVM tuning)

kolonell

Quoteone of the parameters in this it says to set is the -Xingc switch,
Never heard about that switch before (and google neither apparently). Can you forward the link ? One can never know enough ;-)

jmwhitehead

This is the link to the doc on the ibm support site :-

http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/cognos/performance/cognos_specific/java_garbage_collection.pdf

and here's a sun doc about tuning GC which

http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.htm

Since yesterday I've taken the param -Xincgc out of the bootstrap files and the javavm are now using the throughtput GC instead of the concurrent  low pause collector which that parameter activates, as well as increasing the VM memory size to 1536Mb everything seems to be going okay at the moment.