If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Continuous Dates

Started by rwaugh, 05 Dec 2019 10:37:22 AM

Previous topic - Next topic

rwaugh

Hello,

I am trying to create a query (or more than one that I can JOIN) to get a list of workers and their start/end dates. The problem is that the table I that is available has multiple entries for the same worker but different start/end dates. Basically the system logs various changes and for some reason also creates a "change" on the worker history table. For example the table data would have:




Case IDWorker NameStart DateEnd Date



123456John Doe2019-05-062019-05-29
123456John Doe2019-05-30
987654Jane Doe2019-07-01
587459Jim Doe2019-04-052019-06-25

What I would like to return is the following:




Case IDWorker NameStart DateEnd Date


123456John Doe2019-05-06
987654Jane Doe2019-07-01
587459Jim Doe2019-04-052019-06-25

The problem is that, first, I am unsure how to do this. Second is that each worker could potentially have multiple entries (for this example John Doe has two entries, but I've seen as many as 10 in the data I've looked at so far) due to how the system logs things - which unfortunately is well beyond my control.

Any ideas?

Thanks!

dougp

I suspect that start date and end date do not mean what you think they mean.  This looks like a type 2 slowly-changing dimension.  They are probably the start and end date for the row, not the employee.  (and there are probably lots of columns you are not showing, some that differ between the two rows -- hence the ending of one row and starting of the next)  And the first start date is not when they started (or started working on that case), it's when they first showed up in the system (or first showed up in the system as working on that case).

Please define what "start date" means to you.

Andrei I

This is a classic date interval packing problem.
There are lots of solutions published on the net.
You can google and choose one suitable for your DB setup
https://www.itprotoday.com/sql-server/new-solution-packing-intervals-problem

rwaugh

Thanks for the reply DougP. The first entry for each worker is definitely when they started working on the case. The issue is, as you mention, there are other columns in the table that could also change - instead of updating the record with the updated data it writes a new table entry with the changed column and a new "start" date for the worker - these start dates do NOT correspond to the date the worker started working on the case. I am trying to find a way to ignore the "noise". The end date would also be incorrect for the worker right up until the worker actually changes. So there could be 10 entries for John Doe, case is transferred to Jane Doe and now there would be a last entry for John and a first entry for Jane for the same case.

Thanks,

Quote from: dougp on 05 Dec 2019 11:04:10 AM
I suspect that start date and end date do not mean what you think they mean.  This looks like a type 2 slowly-changing dimension.  They are probably the start and end date for the row, not the employee.  (and there are probably lots of columns you are not showing, some that differ between the two rows -- hence the ending of one row and starting of the next)  And the first start date is not when they started (or started working on that case), it's when they first showed up in the system (or first showed up in the system as working on that case).

Please define what "start date" means to you.

bus_pass_man

#4
Hi Rwaugh.

What you are describing is a type II SCD, which doug pointed out. 

As Andrei pointed out the problem fits into a well known pattern.  It is always helpful if you can identify such patterns as then someone has probably thought up a solution to the problem.

The link gives you a SQL approach but you would probably want to use expressions.

These might be helpful.

For the start date.
Minimum (start date for case ID )

The end date is a bit trickier.  Try something like this.

If (nullif (coalesce   (cast (end date, date), cast (2999-12-31 as date) ), cast(2999-12-31 as date )) is null)
then
(nullif (coalesce   (cast (end date, date), cast (2999-12-31 as date) ), cast(2999-12-31 as date )) )
else
(maximum ( end date for case ID))

I was trying to get null rather than the 2999 nonsense as the help says it will substitute nulls but it didn't return nulls.     nullif takes any case like that and forces the expression to return a null.

Hope that helps.


rwaugh

Thanks so much bus_pass_man! So far, from what I've tested so far, this worked perfectly!

Quote from: bus_pass_man on 05 Dec 2019 12:32:05 PM
Hi Rwaugh.

What you are describing is a type II SCD, which doug pointed out. 

As Andrei pointed out the problem fits into a well known pattern.  It is always helpful if you can identify such patterns as then someone has probably thought up a solution to the problem.

The link gives you a SQL approach but you would probably want to use expressions.

These might be helpful.

For the start date.
Minimum (start date for case ID )

The end date is a bit trickier.  Try something like this.

nullif(
maximum ( coalesce   (end date, cast (2999-12-31 as date) ) for case ID ),
cast (2999-12-31 as date))

I was trying to get null rather than the 2999 nonsense as the help says it will substitute nulls but it didn't return nulls.   If you test out the sub-expression of maximum you will see that for any case where there's a null the 2999 date will be returned.  nullif takes any case like that and forces the expression to return a null.

Hope that helps.

bus_pass_man

I made a wee error so I modified the expression to make it actually produce correct results.