If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

DecisionStrean vs Datastage

Started by Sandgroper, 26 Jul 2005 09:44:12 PM

Previous topic - Next topic

Sandgroper

Anecdotal evidence suggests that DecisionStream is able to accomplish things in a single click that takes loads of coding in DataStage. An example of this would be surrogate creation.

rmcleod

Thoughts of a DecisionStream/Datastage developer

Key differences
DataStage Enterprise edition is in a different league on the volume and throughput front
DataStage server edition is a bit better on the volume and throughput front than DecisionStream
DecisionStream's inbuilt dimension and surrogate functionality is a major advantage. Datastage has litlle inbuilt on that front

For small volumes DecisionStream is much friendlier to develop on, much less tedious.


Scalabaility
DataStage Enterprise edition has inbuilt parallel processing capability which is needed if you have big volumes of data, DecisionStream has no equivalent and just does not "scale up" on that front. This is a major advantage if you have big volumes of data.

Inbuilt dimension handling
DataStage has no equivalent of DecisionStream's inbuilt dimension and surrogate handling.
In Datastage scd's have to be coded which is slow and error prone.

I don't know if there is anything in Datastage to help with deleted dimension data. I think there is a stage that can be used to do column by column compares of before and after data which may also tell you if a row has disappeared.

Ease of use
On the ease of use front DataStage has advantages in that you can have the equivalent of many Decisionstream builds in a single build in DataStage. You can just keep adding stages to a build to get to the result that you want instead of having to string several builds up in a job in DecisionStream.

Lookups in Datastage are a bit more flexible because you can have several columns as the key on the lookup. In DecisionStream you have to have a concatenated string in a single columns as your key

The scripting language for subroutines/functions is much better in DataStage. DecisionStream has no "subroutine" functioality in it's scripting language

Environment requirements

Others
Server edition engine is in an old proprietary database called Universe which uses hash files and is very esoteric; e.g all of DecisionStream source code is in the catalog tables in the database so it is easy to take a look at that stuff if you need to. In Datastage Server edition you are a very well rounded character if you have made it to the point where you can find your way around in the repository has files. Enterprise edition is apparently all C code.

DecisionStream has better database stages; e.g. DataStage has no bulk load stage for sqlserver

DecisionStream has good logging straight into the catalog tables for history of job start and end times. Datsatge has no equivalent

DataStage has a good gui for running jobs and displaying logs, elapsed times for steps, row counts down links. DecisionStream has no equivalent.

c6lapsteel

What do you consider 'small volumes'?

At what volume/throughput would you recommend switching from DecisionStream to DataStage?

Thanks,
c6

rmcleod

Over 10-20 million rows. It only uses one process per job, no matter how many rows - Datastage can parallel on one job.

COGNOiSe administrator

Do you mean to say, that if I have 100M rows, which need to be transformed and loaded, DataStage has the capability to split them up into 10 parallel 10M rows loads?