If you are unable to create a new account, please email support@bspsoftware.com

 

Resolved: Cognos ENV hangs often because of bibustkservermain.exe

Started by SGD, 09 Feb 2015 05:19:48 PM

Previous topic - Next topic

SGD

Hello All,

I have tried to explain an issue as much as in detail. Please provide your valuable comments on the below.

What was the Root Cause?

1. We have customized Cognos portal tabs in Cognos PROD ENV and observed that those portal tabs and reports were become very slow suddenly which were working fine until afternoon around 14:00.
2. It was keeping on loading for infinite time and did not showing anything on the screen; not even any error message except for rotating circle.
3. There was no specific reason why this problem had occurred but came to know that there were two instances of 'bibustkservermain.exe' running on Cognos App Server, one process was taking 1 GB and the other had taken 813 MB with 100% of CPU usage.
4. Due to maximum usage of CPU and RAM no other processed were getting executed and Cognos services were totally hanged.
5. Surprisingly Cognos Contributor and Cognos Controller applications were running as expected during this issue.
6. Due to this issue many of our Business users got impacted and we had to restart the servers to fix this issue.

Action plan:

Below are the instructions were provided to restart Cognos PROD servers to fix this issue:
1. First, stopped the servers in following order:
    a.   Cognos Planning Terminal server
    b.   Cognos Gateway
    c.   Controller Gateway & App Server
    d.   Cognos BI App Server
    e.   Cognos Content Store Database
2. Then, started the servers in reverse order:
    a.   Cognos Content Store Database
    b.   Cognos BI App Server
    c.   Controller Gateway & App Server
    d.   Cognos Gateway
    e.   Cognos Planning Terminal server

No of Cognos users:

1. Cognos report users:  Approx 1500
2. Cognos Planning users: Approx 300
3. Cognos Controller users: Approx 200

Existing Configuration of Cognos PROD ENV:

1. Cognos Planning Terminal server
    a.   CPU – 1
    b.   RAM – 8GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 40GB
    e.   D: –  60GB
    f.   Virtual – Yes

2. Cognos Gateway
    a.   CPU – 2
    b.   RAM – 8GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 60GB
    e.   D: –  60GB
    f.   Virtual – Yes

3. Controller Gateway & App Server
    a.   CPU – 24 (4 x 6)
    b.   RAM – 64GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 40GB
    e.   D: –  100GB
    f.   Virtual – No

4. Cognos BI App Server
    a.   CPU – 8
    b.   RAM – 32GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 80GB
    e.   D: –  60GB
    f.   Virtual – Yes

5. Cognos Content Store Database
    a.   CPU – 1
    b.   SQL Server – MS SQL 2008 R2

Cognos Application Details:

1. IBM Cognos BI v10.2.1
2. IBM Cognos Planning v10.1.1 FP2
3. IBM Cognos Controller 10.1.1 FP2
    a. Currently using web Client but moving to Local client soon
4. Cognos Go! Office

Questions:

1. What could be the possible reason(s) that this issue happens almost once in a every month?
2. What action can be taken to resolve this issue permanently?
3. Is there any work around to fix this issue without restarting Cognos servers?
4. Is our server configuration is sufficient to handle these many users?
5. If not, what changes are required in scope of server configuration and infrastructure point of view?
Regards,
S.G.D.

SomeClown

1 - Sounds like someone is running a poorly written report or query
2 - Find the source of the issue
3 - Not that I'm aware of
4 - Seems underpowered to me (more #4s, #5 should be a bit bigger)
5 - Where is Planning Server running?  Didn't see it in the list.  If #4, then it should be on its own server (same with Content Manager - own server)

MFGF

Quote from: SGD on 09 Feb 2015 05:19:48 PM
Hello All,

I have tried to explain an issue as much as in detail. Please provide your valuable comments on the below.

What was the Root Cause?

1. We have customized Cognos portal tabs in Cognos PROD ENV and observed that those portal tabs and reports were become very slow suddenly which were working fine until afternoon around 14:00.
2. It was keeping on loading for infinite time and did not showing anything on the screen; not even any error message except for rotating circle.
3. There was no specific reason why this problem had occurred but came to know that there were two instances of 'bibustkservermain.exe' running on Cognos App Server, one process was taking 1 GB and the other had taken 813 MB with 100% of CPU usage.
4. Due to maximum usage of CPU and RAM no other processed were getting executed and Cognos services were totally hanged.
5. Surprisingly Cognos Contributor and Cognos Controller applications were running as expected during this issue.
6. Due to this issue many of our Business users got impacted and we had to restart the servers to fix this issue.

Action plan:

Below are the instructions were provided to restart Cognos PROD servers to fix this issue:
1. First, stopped the servers in following order:
    a.   Cognos Planning Terminal server
    b.   Cognos Gateway
    c.   Controller Gateway & App Server
    d.   Cognos BI App Server
    e.   Cognos Content Store Database
2. Then, started the servers in reverse order:
    a.   Cognos Content Store Database
    b.   Cognos BI App Server
    c.   Controller Gateway & App Server
    d.   Cognos Gateway
    e.   Cognos Planning Terminal server

No of Cognos users:

1. Cognos report users:  Approx 1500
2. Cognos Planning users: Approx 300
3. Cognos Controller users: Approx 200

Existing Configuration of Cognos PROD ENV:

1. Cognos Planning Terminal server
    a.   CPU – 1
    b.   RAM – 8GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 40GB
    e.   D: –  60GB
    f.   Virtual – Yes

2. Cognos Gateway
    a.   CPU – 2
    b.   RAM – 8GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 60GB
    e.   D: –  60GB
    f.   Virtual – Yes

3. Controller Gateway & App Server
    a.   CPU – 24 (4 x 6)
    b.   RAM – 64GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 40GB
    e.   D: –  100GB
    f.   Virtual – No

4. Cognos BI App Server
    a.   CPU – 8
    b.   RAM – 32GB
    c.   OS – Win 2008 R2 SP1 Standard Edition
    d.   C: – 80GB
    e.   D: –  60GB
    f.   Virtual – Yes

5. Cognos Content Store Database
    a.   CPU – 1
    b.   SQL Server – MS SQL 2008 R2

Cognos Application Details:

1. IBM Cognos BI v10.2.1
2. IBM Cognos Planning v10.1.1 FP2
3. IBM Cognos Controller 10.1.1 FP2
    a. Currently using web Client but moving to Local client soon
4. Cognos Go! Office

Questions:

1. What could be the possible reason(s) that this issue happens almost once in a every month?
2. What action can be taken to resolve this issue permanently?
3. Is there any work around to fix this issue without restarting Cognos servers?
4. Is our server configuration is sufficient to handle these many users?
5. If not, what changes are required in scope of server configuration and infrastructure point of view?

Hi,

bibustkservermain.exe processes are spawned by your Cognos BI java.exe process when running CQM reports. CQM is 32-bit and is written (I believe) in c# (or could even be c++) and run outside of the main java servlet contaimer for the application server. My guess is that someone has written the "report from hell" using a CQM source, and it is consuming vast quantities of machine resources when it runs. It is a possibility that this report has a monthly schedule?

If you can identify which report is the cause (and therefore its package) you could in theory add governors to the package to limit run time/rows retrieved so it gets automatically stopped.

You can also simply kill the offending bibustkservermain.exe process if you don't want to restart your entire environment. This will result in the offending report failing, but others should continue.

Cheers!

MF.
Meep!

SGD

Quote1 - Sounds like someone is running a poorly written report or query
Yes, even MFGF is also saying same thing. Now, I need to find out such report(s).

Quote2 - Find the source of the issue
But not sure how I can find such problematic report(s).  :-\

Quote3 - Not that I'm aware of
It works as expected after restarting all the Cognos servers in given order. So, if I kill the 'bibustkservermain.exe' process (as suggested by MFGF) then will it impact Cognos Planning and Controller users at that point in time?

Quote4 - Seems underpowered to me (more #4s, #5 should be a bit bigger)
If Cognos App Server and Content Store Db server seems to be underpowered then what could be the optimum configuration you can recommend for it? What about other Cognos servers, those look fine?

Quote5 - Where is Planning Server running?  Didn't see it in the list.  If #4, then it should be on its own server (same with Content Manager - own server)
Planning server is same as 'Controller Gateway & App Server' in the list below 1a.
Regards,
S.G.D.

MFGF

Quote from: SGD on 10 Feb 2015 05:31:49 AM
if I kill the 'bibustkservermain.exe' process (as suggested by MFGF) then will it impact Cognos Planning and Controller users at that point in time?

It shouldn't impact them, no. It's a BI process you are killing.

Cheers!

MF.
Meep!

SGD


Thanks MFGF for the prompt replies.

Quotebibustkservermain.exe processes are spawned by your Cognos BI java.exe process when running CQM reports. CQM is 32-bit and is written (I believe) in c# (or could even be c++) and run outside of the main java servlet contaimer for the application server. My guess is that someone has written the "report from hell" using a CQM source, and it is consuming vast quantities of machine resources when it runs. It is a possibility that this report has a monthly schedule?

What do you mean by 'using a CQM source'? How can I identify such reports?
Regards,
S.G.D.

MFGF

Quote from: SGD on 10 Feb 2015 06:16:36 AM
Thanks MFGF for the prompt replies.

What do you mean by 'using a CQM source'? How can I identify such reports?

Any packages that are not published using Dynamic Query Mode (and any based on data sources that are not JDBC) will be using Compatible Query Mode.

Cheers!

MF.
Meep!

SGD

Hi MFGF,

Also, could you please answer the below?

Quote4. Is our server configuration is sufficient to handle these many users?

Sorry for so demanding.  :)
Regards,
S.G.D.

MFGF

Quote from: SGD on 10 Feb 2015 07:40:00 AM
Hi MFGF,

Also, could you please answer the below?

Sorry for so demanding.  :)

It's a bit like asking "is this amount of food sufficient for the people coming to our party"? :)

It depends what the users are doing, their roles, the complexity of the reports, the amount of data, the concurrency of the users etc etc. There is no one-size-fits-all answer, sadly. IBM offer a server sizing service where they sit with you and help you fill in a large questionnaire - they can then generate a recommended minimum server spec for you that would handle the required workload.

MF.
Meep!

SGD

Regards,
S.G.D.

SGD

Quote from: MFGF on 10 Feb 2015 05:52:03 AM
It shouldn't impact them, no. It's a BI process you are killing.

Cheers!

MF.

We had same problem happened on 17th Feb. As suggested by MFGF, i have killed 3 instances of 'bibustkservermain.exe' from Cognos App server and issue resolved.

I think this work around and not a permanent fix of this issue.

I am afraid to say I am still not able to identify problematic report yet and exact root cause of the issue. Have verified Windows logs, events and scheduler to check if anything specific executes on those dates which can create problem for Cognos but didn't find anything on Cognos App Windows server.
Regards,
S.G.D.

SGD

As suggested by IBM, have increased dispatcher allocation memory from '768' to '1536' in Cognos Configuration.

In left pane followed the path (Local Configuration -> Environment -> IBM Cognos services -> IBM Cognos BI 10.2 - 32) and updated the 'Maximum memory in MB' resource property.

After applying this change, we have not faced this issue anymore and out Cognos PROD ENV is became quite settled and smooth.  :)
Regards,
S.G.D.

BMbabu

I Understand , your BI Server is running with 100% memory when you are in 768 MB , that why your server is running very poor while running the BI reports.

AS per above mail your Cognos BI RAM is 32 GB, But still you are in 1536 MB, as per IBM standards you have increased memory from '768' to '8192'. Please increase it , your issue will be resolve and never faced any issue in future.

Regards
Madhu B

sdf

had a similar issue before. Pretty much done the same in increasing the allocation size.
But that doesn't solved our issue. The server continues to have 100% cpu usage. (only during restart)
So i did an experiment, since we are not using our cognos metric server, i decided not to include the service when starting cognos.
And it works like magic. Though am not particularly sure why.