If you are unable to create a new account, please email support@bspsoftware.com

 

DM-JOB-0120 logic error in JobStream

Started by rveeja, 27 Aug 2013 12:21:27 PM

Previous topic - Next topic

rveeja

Hi,

We have a job wrapper (job stream) that wraps 12 Jobs which are also job streams as well.
Each individual job streams within the job wrapper is responsible for populating their own individual table without any transformation.
Within the job wrapper, we run the first 7 jobs in parallel and then run the 8th job and then run the remaining 4 jobs in parallel.

Yesterday, the 2nd job and 3rd job failed to start and it says process died and as a result the job wrapper failed.
DM-JOB-0120 logic error in JobStream 'JOB_WRAPPER_JW' has caused a node to become unrunnable, terminating.

I can't find any solution or any notes on the internet to get some idea about this problem.
I am trying to find what could have caused these issues?

It would very great, if some one can point us in a specific direction on how I should go looking into this issue.


FYI:
This job wrapper has been running every night for the 1 year with this type of failure that I have encountered now.
We have made no changes to the ETL within the last 6 months to the database or to the ETL scripts.



Additional FYI:

Our data source: Oracle (Dedicated Server).
Our ETL engine is in Windows 2003 (Virtual Machine).

I have also attached the segment of the log file where I got this error.

Thanks
Rajeev.






[PROGRESS   - 00:51:58] JobStream JOB_WRAPPER_JW; starting
[PROGRESS   - 00:52:08] JobStream Node 31 'JOB_TABLE_01'; executing (pid 3236)
[PROGRESS   - 00:52:13] JobStream Node 28 'JOB_TABLE_02'; executing (pid 4504)
[PROGRESS   - 00:52:18] JobStream Node 25 'JOB_TABLE_03'; executing (pid 5296)
[PROGRESS   - 00:52:18] JobStream Node 22 'JOB_TABLE_04'; executing (pid 4360)
[PROGRESS   - 00:52:18] JobStream Node 33 'JOB_TABLE_05'; executing (pid 2748)
[PROGRESS   - 00:52:19] JobStream Node 21 'JOB_TABLE_06'; executing (pid 532)
[PROGRESS   - 00:52:30] JobStream Node 29 'JOB_TABLE_07'; executing (pid 3316)
[PROGRESS   - 00:53:25] JobStream Node 28 'JOB_TABLE_02'; failed [process died - status 255]
[VARIABLE   - 00:53:25] RESULT = FALSE
[PROGRESS   - 00:53:25] JobStream Node 25 'JOB_TABLE_03'; failed [process died - status 255]
[VARIABLE   - 00:53:25] RESULT = FALSE
[DETAIL     - 00:54:20] JobStream Node 31 'JOB_TABLE_01'; component 'JOB_TABLE_01', RunId 192, AuditId 34613
[DETAIL     - 00:54:24] JobStream Node 33 'JOB_TABLE_05'; component 'JOB_TABLE_05', RunId 192, AuditId 34614
[DETAIL     - 00:54:27] JobStream Node 29 'JOB_TABLE_07'; component 'JOB_TABLE_07', RunId 192, AuditId 34615
[DETAIL     - 00:54:30] JobStream Node 21 'JOB_TABLE_06'; component 'JOB_TABLE_06', RunId 192, AuditId 34616
[DETAIL     - 00:54:33] JobStream Node 22 'JOB_TABLE_04'; component 'JOB_TABLE_04', RunId 192, AuditId 34617
[PROGRESS   - 00:56:18] JobStream Node 33 'JOB_TABLE_05'; succeeded
[VARIABLE   - 00:56:18] RESULT = TRUE
[VARIABLE   - 00:56:18] DM_COMPONENT_AUDIT_ID = 34614
[PROGRESS   - 00:56:19] JobStream Node 29 'JOB_TABLE_07'; succeeded
[VARIABLE   - 00:56:19] RESULT = TRUE
[VARIABLE   - 00:56:19] DM_COMPONENT_AUDIT_ID = 34615
[PROGRESS   - 00:56:20] JobStream Node 31 'JOB_TABLE_01'; succeeded
[VARIABLE   - 00:56:20] RESULT = TRUE
[VARIABLE   - 00:56:20] DM_COMPONENT_AUDIT_ID = 34613
[PROGRESS   - 00:56:26] JobStream Node 22 'JOB_TABLE_04'; succeeded
[VARIABLE   - 00:56:26] RESULT = TRUE
[VARIABLE   - 00:56:26] DM_COMPONENT_AUDIT_ID = 34617
[PROGRESS   - 00:56:32] JobStream Node 21 'JOB_TABLE_06'; succeeded
[VARIABLE   - 00:56:33] RESULT = TRUE
[VARIABLE   - 00:56:33] DM_COMPONENT_AUDIT_ID = 34616

DM-JOB-0120 logic error in JobStream 'JOB_WRAPPER_JW' has caused a node to become unrunnable, terminating.

[PROGRESS   - 00:56:33] JobStream 'JOB_WRAPPER_JW' Failed

jobstream -- failed (27-Aug-2013 00:56:33)


MFGF

Hi,

You have included the log entry from the "Job Wrapper" jobstream, but other than seeing that nodes have failed, it's not telling you much. Have you looked at the log files generated by the individual jobstreams called from within the Job Wrapper (specifically Job_Table_02 and Job_Table_03)? These jobstreams will in turn likely have nodes (such as builds) which are probably generating their own individual log files too. Take a look through the logs to see what the relevant builds for these jobstreams are reporting also.

Good luck!!

MF.
Meep!

rveeja

Hi,

I searched for log files for node 2 and 3 but there is no trace of any log files for 2nd and 3rd node. 
From this, I am guessing that Job 2 and 3 never even started.

However, the other nodes 1, 4, 5, 6, 7 do have log files and they are all good.

Thanks
Rajeev

MFGF

Hmmm. Sounds like the server gave up on trying to run those jobstreams. How many cores soes your server have? Data Manager is not multi-threaded, so each jobstream (which is a separate .exe file) runs on an individual core. If you're trying to run 12 jobstreams concurrently, is your server powerful enough to handle this?

MF.
Meep!

rveeja

Hi,

We put a ticket with IBM and they also mentioned that it could be a resource issue (memory, cpu, storage...) where the ETL engine is sitting on.

We have increased our resource on our ETL Vms and so far no issue has been encountered.

Anyhow, thanks for your time and help.

Thanks
Rajeev

MFGF

Be aware that the Data Manager engine uses PVU based licensing. Adding extra cores to your VM server might have big licensing implications for you.

Cheers!

MF.
Meep!