If you are unable to create a new account, please email support@bspsoftware.com

 

What is best practice to have history, but not terra of data

Started by erwink, 09 May 2016 09:20:52 AM

Previous topic - Next topic

erwink

Hi there

I would be interested in some hints / best practice for the following use case

We use Cognos for our project planning. People can change their data as needed, an look at the result every morning, after the ETL job
Up to now it was enough, so I used TRUNCATE to load data.

Now management would like to be able to have history, snapshot once a month. Easiest way would be to use APPEND. But that would generate a huge amount of data, and 90% waste.

I ended up with several solution, but not sure what is best or state of the art
1) create a 2nd table with the monthly data. Negative: create two FM package, duplicate each report
2) create a 2nd database with the monthly table so i can use the multiple connection feature.
3) mark the data as DoNotDelete once a month, adn inplement a SQL job to delete all other Data
4)...

Thank you for whatever hint you can provide

erwin

bdbits

Is your data source transactional, a data warehouse, a cube?

What do they want to do with the historical data? Will there be different reports (with trending perhaps), or they want to run the same reports with an option to select a certain time period?

erwink

Hi,
thank you for your answer..or rather questions  :)

You're right. I try to be short and concise...but at the end there is missing information

I've a DRM, build using Datamanager. Part of the data put together is for budget planning.
Project and Portfolio Manager have to deliver their needs per month for next BY till a given date. At this date the data will be frozen.
Only in case of budget adjustment will a new snapshot be made

Need with prio top down
- freeze the data at a given time and report on this data set - same reports than for daily data [plan per month]
    also same report, but for a selectable snapshot
- diff between daily and last snapshot
- ( diff between 2 snapshots )
- (( trending ))

hope it's a bit clearer

bdbits

If it is an option, I think the best approach is to rethink the data warehouse a bit. I would look at adding a new attribute, call it "Reporting Period". This would be the month to which the facts apply, or put another way the month the numbers become "frozen". Then in your reports, you add a new parameter to select which of these months are the numbers to be included. This will also allow you to compare periods efficiently and however you like within a single report.

Does this make sense? Is this a viable option?

erwink

It is an option since i do also all Datamanagrement stuff

This would request and Update/Insert .. what would overwrite every day till the date is in the new period, where it would insert
Sound good, but does not work in my case because deletion of plan data are not marked as deleted in the source, but just gone.

A solution would be to run a SQL Job to delete the current period and use always append. End up similar to my variant 3)

Is that what you thought about?

erwin

bdbits

It does sound pretty similar. Just to clarify what I was saying, I saw the process as something like this.

* For the sake of discussion, assume you want to report updated numbers each day throughout a period.
* For current numbers, "reporting period" would be null to indicate that it is "current" data.
* When the period is over and you want to save current numbers for historical purposes, remove all data where "reporting period" is null, populate the numbers to be saved and set "reporting period". Since it is no longer null, this is now "historical".
* You can keep the historical data indefinitely, if you want, for trend analysis.
* If you want to get rid of data for a particular reporting period, it is a simple SQL delete based on the "reporting period".


erwink

Thank you bdbits

workflow is fine. need to find the best way to run the single steps.

last question: is there another way than a SQL Job in Datamanager to <remove all data where "reporting period" is null>

bdbits

I am not very fluent in Datamanager. As far as I know that is the way to do it.

MFGF

Quote from: erwink on 11 May 2016 01:16:16 AM
Thank you bdbits

workflow is fine. need to find the best way to run the single steps.

last question: is there another way than a SQL Job in Datamanager to <remove all data where "reporting period" is null>

There are lots of tools you could use for this - Data Manager is just one. I'm guessing Data Manager is your ETL tool? Is there a reason you want to use something different?

MF.
Meep!

erwink

Hi MF

Yes Datamanager is my  ETL tool.

I was asking for another way in the tool than with a SQL element in a Job.  E.g integrated in the Fact Build

But I'm just setting it up and looks quite easy..so probably no need for another solution

Thank you

MFGF

Quote from: erwink on 12 May 2016 09:13:33 AM
Hi MF

Yes Datamanager is my  ETL tool.

I was asking for another way in the tool than with a SQL element in a Job.  E.g integrated in the Fact Build

But I'm just setting it up and looks quite easy..so probably no need for another solution

Thank you

Have you looked at using UPDATE/INSERT behaviour as the refresh type in your fact build? You'd need to make sure you have the appropriate key definitions set to target specific rows to update, but that isn't hard.

Cheers!

MF.
Meep!

erwink

Hi MF

Yes, sure I looked for UPDATE/INSERT . It was actually my first setup...till I found out that deleted entries in the source DB will not be recognized as such and will stay forever. So I changed to TRUNCATE what worked fine till now

thx erwin

Lynn

If your database supports a MERGE statement you can use that to manage this type of insert/update/delete logic.

This article explains it in the realm of DB2 but I know other vendors have this implemented also. It is an ANSI standard.

https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/merge?lang=en