If you are unable to create a new account, please email support@bspsoftware.com

 

How to correctly and efficiently aggregate data set in Report Studio

Started by deadsea, 06 Nov 2014 11:09:11 AM

Previous topic - Next topic

deadsea

Hello Folks:

I am trying to resolve performance issues with legacy report that has two queries that are union'd together to produce required results against a purely relational model. These two queries contain a detail filter on a fact table attribute (before aggregation) of the form: Date = Maximum(Date for Account). The reason this filter is being used is because Account contains data that is identical for all months of a fiscal year. To make this more painful the [Date] column is present in fact table as an attribute instead of using a SKey to link to the date dimenion.

For now, I have created non clustered index on the fact table column with my Date field being index key and all other fact columns serving as secondary key, but thus far no improvement. Without this filter the performance is sub second and acceptable, however, with this filter, the performance drops to about 15-20 min run time. Furthermore, this issue was being masked by using query caching in RS Query properties, which does not fit in to the requirements at the moment.

With the fact table sitting at 60 million records using or even having this attribute in the fact table and then using it in a filter is an obvious blunder here and usually I would resolve this in DataStage at ETL stage, but for now I need to resolve it at the Cognos layer.

Any word of advise or suggestions are appreciated.

Thanks.

deadsea

Any pointers on how folks have resolved this issue in the past? Any other functions that can be used to create a "wrapper" that will utilize the indexing and speed things up a bit?

bdbits

Have you looked at the generated SQL with and without the filter? It seems odd to me that adding a filter reduces performance, unless it is processing it locally on the Cognos server (especially with 60M rows). This would be the very first thing I would check. You could also run a database trace. Or take your report-generated SQL and run it in a query tool to see if the same thing is happening.

deadsea

Hello:

I am trying to figure out how I can perform row based aggregation and then apply column based roll ups on fact data that I have attached in this post:



Currently, the only way to get the correct and desired aggregation (when grouping by Fiscal Year) is to take the maximum fact value from a set of dates. In this case taking max value from Month # 1 to Month # 12 which happens to be identical and using max also takes care of 0.00 values you see below. Typically to get the desired aggregation and roll up for a Fiscal Year, I would have used Average as my aggregate and then Total as my rollup aggregate. However, in this case it will result in the incorrect value of as seen below:

1)      FOR Total FTE Usage Forecast Column:

(0.25 * 12)/24 = 0.125 instead of the desired result of 0.25 (which we can get via Maximum([Total FTE Usage Forecast] for [Date]))

2)      FOR Total Forecast:

(0.25 * 12)/24 = 5555.55 instead of the desired result of 11111.11 (which we can get via Maximum([Total Forecast] for [Date])

Due to the amount of data involved and Maximum function triggering local processing, it is not plausible to use Maximum approach and hence looking for advise/suggestions on how one would go about using a better approach that is efficient.

Any guidance or advise you may have is appreciated. Also, this is a relational FM model (with DW being a dimensional design) over a MS SQL Server 2012.

Thanks.

deadsea


  • Yes I did look at the generated SQL and its the maximum function causes local processing (XMAX is shown in conjuntion with the filter column in question).
  • In addition the SQL has numerous nested selects instead of just have 1 or 2 along with the inner joins that are enforced via star schema setup in FM. SQL without this filter looks decent with no nested selects and using inner joins specified at the semnatic layer.
  • We have executed the query directly in the DB and the result is same as whats seen in RS/Portal i.e it runs for very long time (long = 5-8 hours).

At this point its safe to assume that this report needs to be redesigned (hopefully without requiring ETL changes). I have created a another thread to see how I can redesign the report better instead of trying to figure this issue out. I will update, if I do find anything regarding solving this issue.

Out main issue is that we are migrating to SAP and I dont want to spend time fixing this issue in DataStage at the moment since I have to redo (and refactor this horrendous fact table design) in SAP BW.

Lynn

Your database designer and your FM modeler have not done you any favors.

Is this two separate fact subject areas put together in one table and lacking any keys to uniquely define the rows?

Do you have any option to go back to the database/FM modelers to revisit reporting requirements and model accordingly?

deadsea

Thanks for your response Lynn.

Unfortunately, this is a legacy report and fact table that we will migrate over the SAP BW at which point I will ETL it properly, however, for now the obvious holes you pointed out are there.

Although, we dont have capacity to change ETL code, I have convinced my superiors to allow FM changes (which I will have to do). Is there something specific that I can perform in FM to resolve this?

Our only issue is that there are numerious report dependencies on this table and resolving them is again out of scope so I was hoping of more of a localized change to this report at RS leve.

Thank you again.

bdbits

One other thing, if you *can* construct a reasonably well-running SQL statement in your database query tools, you can use straight-up SQL as the source of the query data. Normally I would not recommend this but have found it necessary on rare occasions. It kind of defeats the purpose of using Cognos in the first place, but sometimes you just have to get something working.

Lynn

Could you add a calculated data item in the query subject to flag the rows with the max date? Not really sure what your key structure is, but perhaps identifying those rows in the FM will force the processing to happen on the database side. Then your report would just need to filter for those records.

This would be an added item and therefore wouldn't impact anything existing.

I'm kind of guessing here since I can't see the whole thing, but perhaps it will spark some ideas to investigate further.

deadsea

Thanks Lynn. I will try to see if I can do something in the FM model.


deadsea

Hi MODS:

Could you please link/merge this thread with http://www.cognoise.com/index.php/topic,26463.0.html ? I started them seperately to keep them less convulated, although they are linked to one another.

Thanks.

deadsea

Quote from: deadsea on 14 Nov 2014 02:52:46 PM
Hi MODS:

Could you please link/merge this thread with http://www.cognoise.com/index.php/topic,26463.0.html ? I started them seperately to keep them less convulated, although they are linked to one another.

Thanks.

Thanks! That is also on the table. Was hoping that there is some other way :)

deadsea

UPDATE: So created an indicator field but no cigar...the generated sql is still the same and performance same. I suppose even if I create an indicator and filter on it, the SQL generation is being done based on the FM model + filters set in the report, so should not have expected a huge improvement. Thats what I think atleast.

MFGF

Quote from: deadsea on 14 Nov 2014 02:52:46 PM
Hi MODS:

Could you please link/merge this thread with http://www.cognoise.com/index.php/topic,26463.0.html ? I started them seperately to keep them less convulated, although they are linked to one another.

Thanks.

Done!
Meep!