If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Aggregate Functions causing multiples in SQL?

Started by norrisar, 03 Dec 2013 12:41:29 PM

Previous topic - Next topic

norrisar

Hello all,

Hopefully I am able to explain this enough to understand.

I am trying to update some of my reports to point to a SQL database instead of an RDB. It is an identical database otherwise (all data matches perfectly). In the report, there are some fields that have the aggregate function set to Total. In the RDB report, everything runs fine and I get correct output.

For example, the following columns have the following numbers for one specific account:
Weight = 56,601
Tons = 28
Cost = 1024.81
Pieces = 10

Now, running the same report with the same aggregations in SQL, I get the following output for the same account:
Weight = 169,803
Tons = 85
Cost = 3074.43
Pieces = 30

You can see these numbers are a multiple of 3. This happens for every account, but not always 3. Sometimes it's just doubled, sometimes quadrupled, and sometimes it actually does match. Now, if I run the SQL report with all aggregate functions set to None, I get something like the following (4 rows for the account):
Weight = 0, 0, 12221, 44380
Tons = 0, 0, 6, 22
Cost = 210.51, 210.51, 120.76, 483.03
Pieces = 1, 4, 1, 4

If I manually add the totals, it matches the numbers I'm expecting. I'm at a loss for what would possibly be causing these multiplications. Has anyone ever experienced anything like this before? I'm not a very technical person so I don't quite understand anything happening in the background.

Thanks in advance.

calson33

It sounds like one of your joins is bad. Any time you get exact multiples like that, it generally points to either a data problem or a join problem. Since the data seems to add up correctly, then I would look at the joins next.

CognosPaul

Try setting "Auto group and summarize" to no. Then check that account again.

norrisar

I had to ask someone else to check the joins since I do not have access to that information. Hopefully they will be able to get to it today or tomorrow.

For now, I tried changing the Auto Group & Summarize to No and it made the report run for 2 hours (normally takes ~5 min)

For that account, the records were (12 of them):
Weight = 12221, 12221, 12221, 0, 0, 0, 44380, 44380, 44380, 0, 0, 0
Tons = 6, 6, 6, 0, 0, 0, 22, 22, 22, 0, 0, 0
Cost = 120.76, 120.76, 120.76, 210.51, 210.51, 210.51, 483.03, 483.03, 483.03, 210.51, 210.51, 210.51
Pieces = 1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 4

So clearly that would be why the totals are 3x what it should be, but I'm confused why it's duplicating the same record so many times when there are no columns that would differentiate them.

navissar

Hi Norrisar,
Values multiplied almost always point at a join done in a wrong granularity level. More so when the multiplication isn't always by the same factor.
Consider the following example: You have a fact table which has some measures for a product category:

product category                 sales $                Sales Quantity
Keyboard                              1210                            5
Mouse                                   3500                            25
Screens                                 6000                            3

Your dimension is product category and product:
product category                     product
keyboard                                    clicker
keyboard                                    super clicker
mouse                                        super pointer plus
mouse                                        mega point&click
mouse                                        mickey
screens                                      great shower of pixels deluxe

Now, consider what happens when you join product category from dim and fact: Since some product categories appear more than once in dim, the join will return several rows from the fact for each dim value, which will cause data multiplication.

product category (from dim)                        sales$
keyboard                                                       1210
keyboard                                                       1210
mouse                                                           3500
mouse                                                           3500
mouse                                                           3500
screens                                                          6000
when this is aggregated, all hell breaks loose.
That's what I'd check first.