If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Cognos over Redshift

Started by adam_mc, 27 Sep 2017 04:07:41 PM

Previous topic - Next topic

adam_mc

I have set up JDBC connection over Redshift in a DQM model.

I have a large order table that I am testing results and I am getting the following scenarios (with number of rows set to 25):

1.     If I do a SELECT * FROM <table_name> with no WHERE clause, the test never seems to return results.

2.     If I do a SELECT * FROM <table_name> with a date based WHERE clause, the test returns the 25 rows very quickly.
        This is true even if I select a very large date range (>= '2014-01-01').

I'd prefer not to have WHERE clauses in my Physical Layer, but will do so if this is the only way to keep performance solid.

Has anyone else experienced this?
If so, how did you resolve?
Did you leave the WHERE clause or did you leave without and let a Date relationship to this table with a filter in the report resolve the situation?

Is this in any way related to Redshift being a columnar database?
And, if so, are there any special considerations for columnar databases?

Any thoughts would be gratefully appreciated.
Thanks in advance,
Adam. 

bdbits

How many rows are there in the table? Your first example is going to return the entire table - all columns and rows. Why is there such a large date gap between the 25 rows you want and earlier data? Do you never care about the earlier data? Or am I missing something...

In my experience, a where clause in FM at the database layer would be somewhat unusual. This would normally happen in a report, but if it is ALWAYS going to be > some date, perhaps it could be justified.


adam_mc

Both select statements are effectively returning all the data in the table.
The date in the WHERE clause is below the minimum date for that column.

The 25 rows is just the default number of rows on a Test within FM.
It's just the speed of the test I am concerned about.

I agree adding a WHERE clause to my FM model is not what I typically do (or would want to do), but this is a performance related issue only.

robblob

I can't speak to Redshift, but I have a columnar database that is EXTREMELY large, but runs flawlessly (sub-second) when executing SELECT *. 

adam_mc

The other problem I have having is that I can't do a SELECT * as the table on Redshift contains Boolean columns.
These need to be cast as integers to prevent the FM model test from failing (even though it validates).
This means instead of a SELECT *, I need to explicitly define all the columns.

Thanks again for your feedback.

the6campbells

#5
Assume you have a simple database query subject (i.e. select * from [datasource].t) and have not defined any filters on the query subject.

In 11.0.6 or earlier, when you test the query subject you are essentially saying to an RDBMS to perform "select * from ...."

If you had selected one or more data items of the query subject and then tested, it would project the columns you selected "select c1, c2 from ..."

In both cases, the RDBMS is still performing a non-selective query which implies a full table scan. If the object is a complex single/multi-table view then the cost can be increased due to the view definition etc.

You will see an option in FM on a query subject to limit how many rows are presented in the grid BUT prior to 11.0.7 that is not propagated to the query.

As of 11.0.7, if an RDBMS is known to support a "LIMIT-like construct in a query statement, it will be added to the statement as you test.

If an RDBMS is known to NOT have a "LIMIT"-like construct, a JDBC method on the JDBC Statement interface will be called passing the value. Some vendor's drivers may add a "LIMIT" like clause into the statement (i.e. Netezza), some may do something internally in their wire protocol (i.e. Redshift) and perhaps some may simply ignore it by design or perhaps have a defect.

Data modules (since 11.0.0) would call the JDBC method.

The reason this was done stems from some database vendors handling an early result set close in a non-optimal manner.

Imagine, you issued a query that produces a result set with 1M rows. Your application reads 25 rows and closes the result set. Many database vendors will shut things down quickly etc. In other cases, the vendors driver may internally continue to read the remaining (1M - 25) rows. Hence, the total time to close the result set can be slow due to the vendor.

Netezza users can obtain a patch from IBM Netezza support for 7.2 which addresses their prior behaviour.
Vertica customers using 7.2. SP3 or higher can use their MARS feature 

https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/ConnectingToVertica/MARSAndRBS/MultipleActiveResultSets.htm?Highlight=MARS