If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Data Set Report is running slow

Started by ravvoggu, 19 Aug 2020 02:13:37 AM

Previous topic - Next topic

ravvoggu

Hi All, Good Morning!
We are working on a Data Set POC and created the data sets and loaded the data from the source data module. The main data set has around 16500000 records and took the time around 1 hour mins to load the data. Then we created a final data module with only data sets for creating the reports. The developed report has two filters. The first one is time period filter which filters the two years data and the second one is category which filters the particular category. The report is taking around 6 minutes of time for first time and it is very fast (5 seconds) in the subsequent runs. The same time (around 6 mins) the report is taking to run after cognos server restart every time. It is taking less time taking in the subsequent runs. Could you please let us know if this is expected behavior? Any suggestions?

Thanks,
Ravindra

MFGF

Quote from: ravvoggu on 19 Aug 2020 02:13:37 AM
Hi All, Good Morning!
We are working on a Data Set POC and created the data sets and loaded the data from the source data module. The main data set has around 16500000 records and took the time around 1 hour mins to load the data. Then we created a final data module with only data sets for creating the reports. The developed report has two filters. The first one is time period filter which filters the two years data and the second one is category which filters the particular category. The report is taking around 6 minutes of time for first time and it is very fast (5 seconds) in the subsequent runs. The same time (around 6 mins) the report is taking to run after cognos server restart every time. It is taking less time taking in the subsequent runs. Could you please let us know if this is expected behavior? Any suggestions?

Thanks,
Ravindra

Hi,

You didn't specify what version of Cognos Analytics you are using. The 11.1.x versions work a little differently to 11.0.x in that they use a new SPARK based compute service for data set query execution, whereas 11.0.x uses DQM directly for this.
Whichever version, the way the data set works is this:
- When created, the data is encoded to parquet format, compressed, encrypted, and stored in the content store database.
- The first time you need to use the data set, Dynamic Query planning will need to retrieve the data from content manager and create a local copy in local storage. After planning, the query is then either handed off to the compute service to be executed (in 11.1.x) or executed directly by DQM (in 11.0.x).
- Subsequent times you need to use the data, Dynamic Query will perform a form of query re-use if it can, to avoid re-executing previously executed requests (on a per user session basis).
- By default, local copies of Parquet files not accessed for ten minutes are purged, and will need to be re-retrieved from content manager if needed again.

Some other considerations in terms of performance are:
DQM local processing and the SPARK compute process may need to spill/cache objects to disk.  Ideally, the storage device is able to service I/O requests quickly and avoid having long queues of pending I/O requests that would add latency.
Sort your data sets on creation if you are planning to filter commonly on a particular item, Parquet reads will attempt to bypass blocks of data where a filter is looking for values that a block is known to not contain. When a data set is created, the rows are written in a non-deterministic order. If the data set includes a sort specification, the rows will be written ordered which improves the density of rows in a page with a given value.
Avoid storing lots of columns that are very rarely or never used. Those columns add overhead to creating the file, the space each file requires, and impact what is held in a row group. If you have excess columns which may be used occasionally, consider splitting the data into separate files.

A special mention to the6campbells for much of this info :)

Cheers!

MF.


Meep!