If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

ETL Server Network Location

Started by Loren, 23 Feb 2006 08:25:39 AM

Previous topic - Next topic

Loren

Hi all,
Currrently I run an ETL process on an NT box between 2 *nix Oracle DB's on the same network segment and in the same server facility. The process takes about 12 hours each night to extract data from one server and load it into the other. We are talking about millions of records here.

Here is my problem: Upper Management wants to move this ETL server to a new facility across the country to save electricity. They consider it just an NT box, while the Oracle servers will remain. I am concerned that if the current configuration is compromised, we may see a direct performance hit.

I have been optimizing the ETL for 3 years to get it down to 8-10 hours. Now that it is where I want it, I am afraid relocation of the NT ETL server far away from the DB's will extend process time to the ETL, maybe even by double.

How can I measure and explain my case to these pencil pushers? They want facts and I can't tell them why or how the network separation would cause issues. How can I provide them proof?

CoginAustin

Your bottleneck would be bandwith. Instead of using a gigabit line to your server or whatever your using now locally you would be using a dedicated line between facilities. Depending on the size of your pipes and I am sure its not gigabit between facilities you will see a severe perforamce hit. Network latency, bandwith issues, and a host of other problems can occur.

I would measure exactly how much data you transmit to/from the ETL box to/from your DBs. Figure out the the transfer time between the two locally and then figure out the transfer time between the two using your new pipes.

I would guess there will be a severe time penalty for the move from its current home.

Loren

Quote from: CoginAustin on 23 Feb 2006 08:43:50 AM
I would measure exactly how much data you transmit to/from the ETL box to/from your DBs. Figure out the the transfer time between the two locally and then figure out the transfer time between the two using your new pipes.
Thx for the response.
I reported to them the amount of data, and they say "no big deal, that's not much". This is where I get lost in my persuasion. The amount isn't the issue, it's why does the network affect the amount.

Is data pushing and pulling as a constant? I guess my question, and theirs, is what exactly is the network doing that makes such a difference in overall speed? How would milliseconds add up if there is aggregation going on while other data is making the transfer in the meantime?

CoginAustin

It makes a difference from a bandwith point of view.

For each 10 meg of data a 1000 mbps gig line will take less then 1 second to move from point a to point b
That same 10 meg of data will take 53 seconds at T1 speed (1.554 mbps) or approaching 2 seconds at T3 speed(44.736 mbps)

Depending on the amount of data and the difference in transfer speeds the transfer time difference can be absolutely huge. However, if the data is being pulled in the middle of the night and no one is around needing that data until morning the transfer time plus build time may be meaningless and it wouldnt matter if it is here or there.

How much data are you talking about?