If you are unable to create a new account, please email support@bspsoftware.com

 

Tivoli Workload Scheduler and Job Cancellation

Started by eknight, 06 Dec 2012 03:01:23 AM

Previous topic - Next topic

eknight

Hi there,

I've noticed that our system seems to miss rows every now and again. It seems like it may be linked to job cancellations. Does anyone understand how exactly the canellation mechanism in TWS works?

If a DS job is currently executing a DS build and I cancel the job in TWS is there any possibility that only the running build is killed but the job execution continues after the build to the next procedure node? This seems unlikely but it would explain some of the problems I'm having.

Thanks,

MFGF

It's possible. Builds and jobstreams run as two separate executables.  If TWS is just killing the databuild.exe process and not the rundsjob.exe process, the jobstream will continue. The way to check is to modify the jobstream to check for failure of the build using a condition node, and only proceed to the procedure node if the build succeeded.

Cheers!

MF.
Meep!

eknight

Yeah ok thanks, that's kinda what I was thinking. Now I just need to find some Tivoli experts  :D

MFGF

Until you manage to find them, just add a condition node after the build node, and code the expression as  return $RESULT;  - link the True output to your Procedure node and the False output to a procedure node which writes a message to the log and doesn't link on anywhere else. It's good practice to do this anyway, to trap if your builds are failing for any reason and terminate the processing leg. Oh, don't forget to set the 'Action on Failure" setting of your build node to 'Continue' so it moves along to perform the next check. I imagine it must be set to this anyway...

Cheers!

MF.
Meep!

eknight

At the moment it seems like all our builds have their 'Action on failure' set to 'TERMINATE'. Do you know what the behaviour of that is?

I think I'm using a fairly old version of DS. Version 7.1.778.0 incase that makes a difference.

MFGF

That is the default setting, and it means that processing should stop in the current flow if an error occurs.

I don't recall any issues with this in either DecisionStream or Data Manager. Do you have parallel flows running in your jobstream? If so, TERMINATE will allow these to continue when the current flow stops because of the error. Setting the value to ABORT should stop all flows in the jobstream.

Cheers!

MF.
Meep!

eknight

So if TWS kills databuild.exe instead of rundsjob.exe but the build has an 'Action on failure' set to 'TERMINATE', is there a chance that a procedure node directly after the build node is executed? The Job is a linear chain of builds + procedure nodes.

Personally I kind of like your suggestion of 'Action on failure' set to 'CONTINUE' followed by a condition node which checks $RESULT. At least then we'd have clearer logging.

My guess is that it's not possible that the procedure node receives execution time which would mean my problem is in another castle (|:(    <-- Sad Toad.

MFGF

Quote from: eknight on 11 Dec 2012 08:09:16 AMSo if TWS kills databuild.exe instead of rundsjob.exe but the build has an 'Action on failure' set to 'TERMINATE', is there a chance that a procedure node directly after the build node is executed? The Job is a linear chain of builds + procedure nodes.

No - with Action on Failure set to Terminate, the jobstream should end processing when the build fails if there are no other parallel legs of processing. What does the log of the jobstream show? If you enable Detail logging, it should list each node and it's success or failure.

Cheers!

MF.
Meep!