Incorrect aggregation results

grifter102 · 25 Feb 2013 05:22:47 PM

I've been trying to model a fairly complex set of relational tables from the ground up, and ran into an issue right from the beginning. I'm starting with 3 tables, one that stores work orders, one that stores workers and one that maps them together. This package is based on work orders and not every work order has a worker associated with it, so I want to keep the joins to the mapping table and worker table optional.

The first table is at the work order item level with the key/determinant [WO_KEY, WO_ITEM]. The worker table has the key [SUPP, CAND_NUM]. The mapping table has a fairly complex key having to do with service orders and line items... but the basic layout is that it has one and only one row for every work order item [WO_KEY,WO_ITEM] and one worker key associated with that item (the same worker key can show up on multiple work order items).

The tables' relationships that I've set are as follows:

WO_ITEM 1..1 --- 0..1 WO_MAP 1..n --- 0..1 WORKER

WO_ITEM has data like the total quantity and commit allowed, WORKER has data like the worker straight-time rate and overtime rate. The quantity/commit are set to aggregate using SUM, the rates are set to aggregate using average. To test out the relationships and determinants, I set up a simple query which I ran with auto sum enabled:

WO_KEY	WORKER_ST_RATE
101	$50.00

I limited it to just the one WO_KEY for speed reasons and it worked. I then added another WO_KEY to the report, this line had a corresponding ST_RATE of $100. Here's what returned:

WO_KEY	WORKER_ST_RATE
101	$75.00
102	$75.00

The ST_RATEs are averaged once for the report, then they're assigned to each WO_KEY that I include. The generated Cognos SQL looks like this:

select
WO_ITEM.PO_KEY,
XAVG(WORKER.WORKER.ST_RATE at WORKER.SUPP, WORKER.CAND_NUM)
from
WO_ITEM left outer join WO_MAP on ... left outer join WORKER on ...
where WO_ITEM.WO_KEY in (101,102)

and the Native SQL is this:

select
WO_ITEM.PO_KEY,
WORKER.SUPP,
WORKER.CAND_NUM,
WORKER.WORKER.ST_RATE
from
WO_ITEM left outer join WO_MAP on ... left outer join WORKER on ...
where WO_ITEM.WO_KEY in (101,102)

So obviously something odd is happening behind the scenes.

If I set the report to include the key fields of the WORKER table (SUPP and CAND_NUM), then it returns this:

WO_KEY	SUPP	CAND_NUM	WORKER_ST_RATE
101	ABC	001	$50.00
102	XYZ	001	$100.00

In this case the query returns the correct results. If I go to the query options and set Generate SQL as View, then the report always returns the correct information, regardless of worker key inclusion. The problem is that when the user is in QS they can't set their query to be "as view".

I've tried a few things and had some success... but I would like to know 1. why this is happening, 2. if it is expected due to some error of mine 3. if there is an accepted workaround (if it's not an error) or 4. what I should be doing (if it is my own error). Here's what I've tried:

1. I've tried all manner of determinants on the WO_MAP table. I've tried the work order item key as unique and the worker key as group by, I've tried adding all the items from both keys to one unique determinant... I've removed the determinants completely... nothing seemed to work

2. I created a new business layer object combining the WO_MAP and WORKER tables and created a business layer join to this table from the WO_ITEM table (and had FM replicate the underlying relationships). I set that business layer object to "as view" and when I recreated the test query subject using the ST_RATE from that new business subject it worked with no problems

3. I created a new database subject via SQL joining WO_MAP and WORKER together, set that "as view" and joined it to WO_ITEM, then recreated the test query subject using ST_RATE from the new DB subject, and that worked well, too

So I have 2 valid options that I can make work... but I'd really like to understand what is going on behind the scenes here. I'm also worried about performance. This is just the beginning of the modelling and the mapping table also contains keys for 3 other levels (supplier, request item, request header)... each of them with their own measures. Unfortunately the users want to be able to bring in measures from any of these levels and have them calculate properly... so I'm worried that creating 3 other joined subjects working "as view"s will completely kill performance or cause some other type of problems.

Before I start to go down the "as view" road I wanted to know if anyone had any "best practice" ideas for me to try... or could tell me exactly what is happening here so I understand things a bit better. Any help is greatly appreciated.

Thanks in advance.

blom0344 · 26 Feb 2013 05:36:14 AM

I would try to do the following (as a test):

Use an SQL query subject to merge/join WO_ITEM and WO_MAP using a dummy value 'x' for missing keys. Define a union subject on the WORKER table adding a row that represents the dummy value 'x'.
(x a value that does not exist as a worker key)

You now have 2 Subjects that relate as: N:1 (fact <--> Dimension) which matches an ideal submodel. Try whether this solves the incorrect aggregation

grifter102 · 27 Feb 2013 07:47:18 AM

I tried that and it is still aggregating improperly.

Just to be sure I followed the right instructions, I set up one SQL subject like:

Code Select

select
wo_key,
wo_item,
coalesce(supp,'X') as supp,
coalesce(cand_num, 'X') as cand_num
from
wo_item
left outer join
wo_map
on wo_key = wo_key and wo_item = wo_item

and the other as:

Code Select

select
supp,
cand_num,
worker_st_rate
from
worker

union all

select
'X' as supp,
'X' as cand_num,
0 as worker_st_Rate
from
dummy

then I set up the relationship as WO_and_MAP 1..n ---- 1..1 WORKER_with_X.

When I created the test table with WO_KEY and WORKER_ST_RATE and auto summed it gave me the same incorrect averaged result for both WO_KEYs ($75 and $75 instead of $50 and $100).

blom0344 · 27 Feb 2013 02:31:04 PM

I would expect the wo_key to be an FK related to the Worker table. Does supp and cand_num uniquely define a worker?

grifter102 · 28 Feb 2013 12:09:33 PM

The WO table I'm using is at the item level, so WO_KEY is the key for the WO, WO_Item_Num is also needed for the full key to the table. Yes, Supp and Cand_Num uniquely identify a worker.

blom0344 · 01 Mar 2013 05:56:57 AM

1. Did you use determinants anywere in the model? (If yes, then where and how)
2. Do you get proper results when setting the SQL generate option of the Query Subject to : 'As view' ?

grifter102 · 03 Mar 2013 10:27:53 PM

1. Yes, WO_ITEM has a single unique determinant on WO_KEY and WO_ITEM. WORKER has a unique determinant on SUPP_ID and CAND_NUM. I've tried one unique determinant on WO_MAP using WO_KEY, WO_Item, SUPP_ID and CAND_NUM, and then again with just WO_KEY and WO_Item. Both had the same results.

2. Yes, I mentioned that in my original post. If I set the test subject to As View then it works. Or if I create a joined copy of WO_MAP and WORKER via SQL and set that to "As View", it also works.

blom0344 · 04 Mar 2013 01:34:19 AM

Lose the determinants - and then test - as these are required for multifact , multigrain scenarios. The cardinality alone should tell Cognos enough..

In your original post you do indeed mention QS. But if you publish with the correct settings in the model, then the generate as view should work by default

grifter102 · 04 Mar 2013 11:26:06 AM

This is just the beginning of a larger model that I'm building. WO_MAP also needs to link to a Supplier table which is at a different grain (and a couple other tables as well). My main concern here is that if I create a separate query subject joining WO_MAP and WORKER and setting it to "As View", then repeat that process for the other levels then it will severely impact performance. What I'm really after is an explanation as to why Cognos is doing this and if there is a better way to fix the problem than creating separate view subjects.

If there is no other way to get this to work, then I'll go down that route. I know that Cognos doesn't really excel at multifact multigrain tables... I was hoping there was some sort of best practice when it comes to creating frameworks around that type of data.

blom0344 · 05 Mar 2013 04:04:44 AM

I do not think that Cognos is weak with multigrain/multifact. Can't think of a Vendor doing better in that area. But in best practice you would use only fact and dimension tables in multiple starschema's, not the more or less relational situation you are in. You are modeling directly against an OLTP schema aren't you?

grifter102 · 05 Mar 2013 11:00:37 AM

Yes, it's OLTP/relational data. It's a constraint that I have to deal with... we don't have the DBA bandwidth to convert it to OLAP so I have to make up for that using Cognos. Seems I run into an issue like this at every turn.

So what's my best course of action here?

blom0344 · 06 Mar 2013 09:34:16 AM

From a distance I cannot tell why this is happening. I would need the model and data to check things out. Another question is whether defining 'as view' instead as 'minimized' would automatically mean worse performance. In your case all joins will be needed, so it would then be a matter of more i.o. because intermediate sets will be larger. This is something you could test before making a decision which way to persevere

grifter102 · 06 Mar 2013 12:55:39 PM

Okay, thanks very much for your advice. I did try creating separate sections 'As View' and performance did degrade severely... it's okay with one of the tables, but after adding 2 more it started to crawl.

Maybe I can ask a more general question... how would you try to model this? I simplified my example above for brevity... but the basic stuff is all correct. I have one main fact table, Work_Order_Item, that is supposed to be the base pool to work with. I then want to join in information from the sourcing tool that went out to find the individual worker to work that job. There are 4 different levels, the request, the request item, the supplier and the worker. Each level has one table with attributes and facts. The 5th level to the request is the actual purchase, that's the WO_MAP table, which, in english, says "I hired this candidate and he's now associated with items 1, 2 and 3 of Work_Order ABC". So the WO_Map links all the other 4 levels of the request to the Work_Order_Item. Since this report is based on Work_Order_Items, and not all work orders went through the requisition system, I need to make sure I'm pulling ALL work orders and the associated req details (left outer join).

My first pass had:

/-- 0..1 Request Header
Work_Order_Item 1..1 ---- 0..1 WO_MAP 1..n --|-- 0..1 Request Item
|-- 0..1 Supplier
\-- 0..1 Candidate

But that resulted in the issues in my OP. I guess I'm just not sure how to go about modelling this set of tables in a way that Cognos likes and that won't kill performance (all the tables above have 700k - 2.5m rows of data, so pulling them all into memory joined to WO_MAP as views drags it to a halt).

I know it's an extremely open-ended question... but any direction that could be provided would help.

blom0344 · 06 Mar 2013 01:50:07 PM

But what did you test? Is the goal to pull ALL data from ALL tables involved? In best practice terms, if 2 query subjects have 1:1 cardinality, then it is better to merge them into 1 new subject. So no matter what, Work_order_item and WO_map should constitute the fact object. I built models on OLTP systems where these type of problems where dealt with by using database views. For the cognos model the use of the view would be identical compared to the situation where a true facttable would exist as result of the merge of Work_order_item and WO_map.

If you are interested, then you can drop me a PM and we can perhaps arrange to exchange the model to make matters more clear

grifter102 · 07 Mar 2013 04:06:04 PM

So far I've tested the following:

1. The model as described in my previous post, WO_ITEM joined to WO_MAP and WO_MAP joined to all the requisition tables. All are set to minimized. With or without determinants on the requisition tables the auto summed totals are incorrect.
2. WO_ITEM joined to a combined WO_MAP and WORKER SQL subject set to "As View". Totals are correct, performance is impacted.
3. WO_ITEM joined to several combined tables (WO_MAP+WORKER, WO_MAP+SUPPLIER, WO_MAP+REQITEM and WO_MAP+REQ), the query wouldn't return data.
4. WO_ITEM and WO_MAP joined in the same SQL query subject joined to WORKER. With or without determinant on WORKER and WO_ITEM+MAP set to minimized or as view returns incorrectly aggregated totals.

I would try to have a database view created joining WO_ITEM and WO_MAP, but when I joined them via FM it didn't work. The best option for me may be to try and have views created for each of the requisition tables, though our DB team won't be able to respond to that request for quite some time (I might be able to get someone to create one quickly... but 4+ would have to go through the normal prioritization process and it'd take forever).

BSP Software	Resources	About Us
MetaManager	BSP Software Training	BSP Software
Integrated Control Suite	YouTube Channel	Micro Strategies Inc
Security Migration		IBM Cognos
Integrated Management Suite

If you are unable to create a new account, please email support@bspsoftware.com

COGNOiSe.com - The IBM Cognos Community

News:

Incorrect aggregation results

grifter102

blom0344

grifter102

blom0344

grifter102

blom0344

grifter102

blom0344

grifter102

blom0344

grifter102

blom0344

grifter102

blom0344

grifter102