Hi everyone,
Long time referencer, first time poster. I've only been using Cognos since last June so please excuse my naiveté when it comes to the ins and out of Cognos and SQL in general.
Below is a raw data sample from my database. The first column is project IDs, and you'll see that they are formatted in 4 levels:
XXXXXXX , XXXXXXX.XXX , XXXXXXX.XXX.XXXX , and XXXXXXX.XXX.XXXX.XXXX
Many of these projects only have start and end dates at the first, second, and third level. The employees are connected to the projects at the 4th level. I need to be able to list the start and end date provided for the closest level.
For example: If an employee is connected to a 4th level, an the closest start date is at the second level, I need to populate start/end info on the fourth level line with the 2nd level info while keeping the 4th level PID.
Example 1:
This is how the data is pulled initially:
Proj ID Project Name Start End
1GA0064 XXXXXXXXXXX 12/23/14 02/28/19
1GA0064.004 XXXXXXXXXXX 12/23/14 02/28/19
1GA0064.004.0001 XXXXXXXXXXX 12/23/14 12/30/17
1GA0064.004.0001.0000 XXXXXXXXXXX
1GA0064.004.0001.0113 XXXXXXXXXXX
1GA0064.004.0001.0114 XXXXXXXXXXX
1GA0064.004.0001.0115 XXXXXXXXXXX
This is what I need it to look like:
Proj ID Project Name Start End
1GA0064 XXXXXXXXXXX 12/23/14 02/28/19
1GA0064.004 XXXXXXXXXXX 12/23/14 02/28/19
1GA0064.004.0001 XXXXXXXXXXX 12/23/14 12/30/17
1GA0064.004.0001.0000 XXXXXXXXXXX 12/23/14 12/30/17
1GA0064.004.0001.0113 XXXXXXXXXXX 12/23/14 12/30/17
1GA0064.004.0001.0114 XXXXXXXXXXX 12/23/14 12/30/17
1GA0064.004.0001.0115 XXXXXXXXXXX 12/23/14 12/30/17
Example 2:
Same premise, just showing that I need it to roll to each exact third level instead of the first.
This is how the data is pulled initially:
Proj ID Project Name Start End
1GA0068.008.0003 XXXXXXXXXXX 06/29/18 06/28/20
1GA0068.008.0003.0200 XXXXXXXXXXX
1GA0068.008.0003.0203 XXXXXXXXXXX
1GA0068.008.0003.0204 XXXXXXXXXXX
1GA0068.008.0003.0205 XXXXXXXXXXX
1GA0068.008.0003.0207 XXXXXXXXXXX
1GA0068.008.0004 XXXXXXXXXXX 06/29/18 01/31/21
1GA0068.008.0004.0200 XXXXXXXXXXX
1GA0068.008.0004.0203 XXXXXXXXXXX
1GA0068.008.0004.0204 XXXXXXXXXXX
1GA0068.008.0004.0205 XXXXXXXXXXX
1GA0068.008.0004.0207 XXXXXXXXXXX
This is what I need it to look like:
Proj ID Project Name Start End
1GA0068.008.0003 XXXXXXXXXXX 06/29/18 06/28/20
1GA0068.008.0003.0200 XXXXXXXXXXX 06/29/18 06/28/20
1GA0068.008.0003.0203 XXXXXXXXXXX 06/29/18 06/28/20
1GA0068.008.0003.0204 XXXXXXXXXXX 06/29/18 06/28/20
1GA0068.008.0003.0205 XXXXXXXXXXX 06/29/18 06/28/20
1GA0068.008.0003.0207 XXXXXXXXXXX 06/29/18 06/28/20
1GA0068.008.0004 XXXXXXXXXXX 06/29/18 01/31/21
1GA0068.008.0004.0200 XXXXXXXXXXX 06/29/18 01/31/21
1GA0068.008.0004.0203 XXXXXXXXXXX 06/29/18 01/31/21
1GA0068.008.0004.0204 XXXXXXXXXXX 06/29/18 01/31/21
1GA0068.008.0004.0205 XXXXXXXXXXX 06/29/18 01/31/21
1GA0068.008.0004.0207 XXXXXXXXXXX 06/29/18 01/31/21
I've been trying to wrap my head around how to do this for a while now. Any help or suggestions would be greatly appreciated! If this is too confusing or not enough information, please let me know!
I don't know what you mean by 'pulled initially' so I need to confirm my understanding about the nature of the metadata.
These project IDs are in the same column in the data base. Is that a correct understanding?
1GA0064
1GA0064.004
1GA0064.004.0001
1GA0064.004.0001.0000
1GA0064.004.0001.0113
1GA0064.004.0001.0114
1GA0064.004.0001.0115
If so, where are the (presumably) start and end date columns? Are they in the same table or another table? If the latter, what is the relationship between the tables?
You want each 4th level subproject to have the start and end date values of its parent 3rd level project ID.
You might want to think about something like maximum (or minimum) (substring (projID,{whatever length is the string you deal with; you might need to use the regex to find the length if it varies}))
Something like this:
maximum ( End_ for (substring ( Proj_ID ,1,16 )))
That works of the ProjID elements are consistently the same length. Unfortunately the regex functions use the period as a wildcard character and I could not figure out how to escape it to use it as a literal.
That said, it is always better if this sort of thing could be avoided by incorporating the values in at ETL time, where the values just need to be retrieved from the db, rather than using an expression.
Another thing to think about is that it actually could be possible that the decision to not include the dates for the child projects was correct (I can't think of a reason why but you need to be open to that possibility). Assuming that isn't the case you might want to consider why it was decided to omit the child project's dates during ETL.
Yet another is that the child projects might have time durations which, although presumably bounded by the parent project's dates, are not the same as the parent project's dates.
In short, the problem is not bounded by the technical matter of what functions can return this pattern.