Continuous Dates

rwaugh · 05 Dec 2019 10:37:22 AM

Hello,

I am trying to create a query (or more than one that I can JOIN) to get a list of workers and their start/end dates. The problem is that the table I that is available has multiple entries for the same worker but different start/end dates. Basically the system logs various changes and for some reason also creates a "change" on the worker history table. For example the table data would have:

Case ID	Worker Name	Start Date	End Date

123456	John Doe	2019-05-06	2019-05-29
123456	John Doe	2019-05-30
987654	Jane Doe	2019-07-01
587459	Jim Doe	2019-04-05	2019-06-25

What I would like to return is the following:

Case ID	Worker Name	Start Date	End Date

123456	John Doe	2019-05-06
987654	Jane Doe	2019-07-01
587459	Jim Doe	2019-04-05	2019-06-25

The problem is that, first, I am unsure how to do this. Second is that each worker could potentially have multiple entries (for this example John Doe has two entries, but I've seen as many as 10 in the data I've looked at so far) due to how the system logs things - which unfortunately is well beyond my control.

Any ideas?

Thanks!

dougp · 05 Dec 2019 11:04:10 AM

I suspect that start date and end date do not mean what you think they mean. This looks like a type 2 slowly-changing dimension. They are probably the start and end date for the row, not the employee. (and there are probably lots of columns you are not showing, some that differ between the two rows -- hence the ending of one row and starting of the next) And the first start date is not when they started (or started working on that case), it's when they first showed up in the system (or first showed up in the system as working on that case).

Please define what "start date" means to you.

Andrei I · 05 Dec 2019 11:20:18 AM

This is a classic date interval packing problem.
There are lots of solutions published on the net.
You can google and choose one suitable for your DB setup
https://www.itprotoday.com/sql-server/new-solution-packing-intervals-problem

rwaugh · 05 Dec 2019 11:30:12 AM

Thanks for the reply DougP. The first entry for each worker is definitely when they started working on the case. The issue is, as you mention, there are other columns in the table that could also change - instead of updating the record with the updated data it writes a new table entry with the changed column and a new "start" date for the worker - these start dates do NOT correspond to the date the worker started working on the case. I am trying to find a way to ignore the "noise". The end date would also be incorrect for the worker right up until the worker actually changes. So there could be 10 entries for John Doe, case is transferred to Jane Doe and now there would be a last entry for John and a first entry for Jane for the same case.

Thanks,

Quote from: dougp on 05 Dec 2019 11:04:10 AM
I suspect that start date and end date do not mean what you think they mean. This looks like a type 2 slowly-changing dimension. They are probably the start and end date for the row, not the employee. (and there are probably lots of columns you are not showing, some that differ between the two rows -- hence the ending of one row and starting of the next) And the first start date is not when they started (or started working on that case), it's when they first showed up in the system (or first showed up in the system as working on that case).

Please define what "start date" means to you.

bus_pass_man · 05 Dec 2019 12:32:05 PM

Hi Rwaugh.

What you are describing is a type II SCD, which doug pointed out.

As Andrei pointed out the problem fits into a well known pattern. It is always helpful if you can identify such patterns as then someone has probably thought up a solution to the problem.

The link gives you a SQL approach but you would probably want to use expressions.

These might be helpful.

For the start date.
Minimum (start date for case ID )

The end date is a bit trickier. Try something like this.

If (nullif (coalesce (cast (end date, date), cast (2999-12-31 as date) ), cast(2999-12-31 as date )) is null)
then
(nullif (coalesce (cast (end date, date), cast (2999-12-31 as date) ), cast(2999-12-31 as date )) )
else
(maximum ( end date for case ID))

I was trying to get null rather than the 2999 nonsense as the help says it will substitute nulls but it didn't return nulls. nullif takes any case like that and forces the expression to return a null.

Hope that helps.

rwaugh · 05 Dec 2019 12:54:44 PM

Thanks so much bus_pass_man! So far, from what I've tested so far, this worked perfectly!

Quote from: bus_pass_man on 05 Dec 2019 12:32:05 PM
Hi Rwaugh.

What you are describing is a type II SCD, which doug pointed out.

As Andrei pointed out the problem fits into a well known pattern. It is always helpful if you can identify such patterns as then someone has probably thought up a solution to the problem.

The link gives you a SQL approach but you would probably want to use expressions.

These might be helpful.

For the start date.
Minimum (start date for case ID )

The end date is a bit trickier. Try something like this.

nullif(
maximum ( coalesce (end date, cast (2999-12-31 as date) ) for case ID ),
cast (2999-12-31 as date))

I was trying to get null rather than the 2999 nonsense as the help says it will substitute nulls but it didn't return nulls. If you test out the sub-expression of maximum you will see that for any case where there's a null the 2999 date will be returned. nullif takes any case like that and forces the expression to return a null.

Hope that helps.

bus_pass_man · 05 Dec 2019 01:19:41 PM

I made a wee error so I modified the expression to make it actually produce correct results.

BSP Software	Resources	About Us
MetaManager	BSP Software Training	BSP Software
Integrated Control Suite	YouTube Channel	Micro Strategies Inc
Security Migration		IBM Cognos
Integrated Management Suite

If you are unable to create a new account, please email support@bspsoftware.com

COGNOiSe.com - The IBM Cognos Community

News:

Continuous Dates

rwaugh

dougp

Andrei I

rwaugh

bus_pass_man

rwaugh

bus_pass_man