Most Frequent Value (Mode) as Aggregation?

mixim · 09 Oct 2014 01:37:34 PM

Hello Guys!

Ill just get right to it!

Problem
While you can create aggregation items (which dynamically adapt to a set grouping of a list) where you can select aggregation functions such as: Minimum, Maximum, Count, Total, Average, Median etc etc...

There seems to be no Mode , which would display the most frequent value as aggregation.
Are there any work-arounds for this with just an item-calculation, avoiding seperate queries and joins?

Example Application
I have a "base" query which shows these three items/columns in a list:

Country	City	Size
USA	L.A.	Big
USA	N.Y.	Big
Poland	Wloclawek	Small
Poland	Warszawa	Big
Poland	Rzeszow	Small

Using the same "base query", I would like to create an additional two column list using Country and Size.
It will be grouped on Country. And Size item/column should aggregate per group with Mode (most common value):

Country	Size
USA	Big
Poland	Small

Any ideas how to solve this in Report Studio? Or would i need to rely on a seperate query (would apperciate input for this if that is the case) Any input for would be much appreciated =)

mixim · 09 Oct 2014 02:23:01 PM

I was able to find someone who has tried before online:

QuoteCreate a query calculation or data item with the following:
maximum (if (running count (<expression> [for <expression>])=maximum(running count (<expression> [for expression]))) then (<expression>) else (null)))

First maximum is for taking out the null value from the result.

Any help ? Because I don't really understand the syntax. I.e. What should be in all the "<expression>" , also i think there should be a dash in "running-count".
Maybe there is a more elegant solution to this? Hope this doesn't derail someones ideas !

Lynn · 10 Oct 2014 06:29:15 AM

Take a look at the attached report spec against the GO sample data. I assume you have a relational package although you didn't specify.

I used product line as your country, product type as your city, and product color as your size. So we want to find the most frequent color (size) for each product line (country) across all the product types (cities).

The list on the left is the base data and includes the two query items I used to determine the mode. I'm using a count function with a specific scope defined in the 'for' clause to count the colors. Then I can use that same expression with a different scope to figure out the maximum per line (country). I don't see a need for running-count in this case, but obviously test with your data to see if the approach will work for you.

The list on the right shows the most frequent color for each product line. It uses a query reference pointing to the base query but has a filter added to specify returning only the maximum color for each line. This will result in separate queries being fired against your database but you won't need redundant code in your report.

Code Select


count( [Product type] for [Product color], [Product line] ) 
#/* This counts the number of cities (type) for each country (line) and size (color) combination */#

You can then wrap that expression within a maximum function and specify a scope for product line to get the highest frequency size (color) for each country.

Code Select


maximum ( [ColorCount] for [Product line] )
#/* Find the highest count per product line */#

You will see in the filtered query how these are used to return just the appropriate information. Note that in the case of a tie it will return multiple rows per country (product line). You would need to decide some method for tie-breaking and adapt accordingly or else leave both values in the result set. This may be a less frequent occurrence with your data but it happens twice in the attached example which is a relatively small data set. Good luck!

mixim · 13 Oct 2014 03:00:40 PM

Quote from: Lynn on 10 Oct 2014 06:29:15 AM
Take a look at the attached report spec against the GO sample data.

Hi Lynn, thank you so much for your input!

This was great for me to see the break down of the for-clause. Thanks for the effort for creating the report. (Even though I do not have access to sample data package, I still was able to check the XML out.)

1. So I defined "grouping" based on the "for". How will this then work in conjunction when I have additional columns other than the mentioned (10 more). Which I would like to aggregate and group using the aggregation settings for items. Is it possible to mix the two approaches?

2. As for your question about relational/dimensional model. Yes It actually is dimensional. And unfortunately I get a cross-join error applying your suggested technique.

Lynn · 13 Oct 2014 03:44:55 PM

Quote from: mixim on 13 Oct 2014 03:00:40 PM
Hi Lynn, thank you so much for your input!

This was great for me to see the break down of the for-clause. Thanks for the effort for creating the report. (Even though I do not have access to sample data package, I still was able to check the XML out.)

1. So I defined "grouping" based on the "for". How will this then work in conjunction when I have additional columns other than the mentioned (10 more). Which I would like to aggregate and group using the aggregation settings for items. Is it possible to mix the two approaches?

You should be able to have other data items that simply aggregate according to the settings you define. I think you'd just want to take care that it all makes sense together and doesn't change the context of your explicit aggregates.

Quote from: mixim on 13 Oct 2014 03:00:40 PM
2. As for your question about relational model. Hmm. The data comes from a starschema based model (Facts/Dimensions). Dont know if this is related, but when trying your approach out It tries to generate a cross-join Which is not allowed in this framework =(

An easy way to tell if your package is relational or dimensional is to open it in Report Studio and look to see if you have a section above the report body for Page Layers and Context Filter. If not then it is relational. If so then it is dimensional and you'd also be able to see a meta data tree and a member tree in the source pane. Both relational and dimensional sources can come from a star schema based model so that isn't necessarily telling you anything.

There isn't enough information to help you solve the cross join problem but it shouldn't be anything inherent in the technique I provided. You could talk with the person who did the FM modeling to see if they can help you understand why this is happening.

mixim · 13 Oct 2014 04:06:59 PM

Quote from: Lynn on 13 Oct 2014 03:44:55 PM
You should be able to have other data items that simply aggregate according to the settings you define. I think you'd just want to take care that it all makes sense together and doesn't change the context of your explicit aggregates.

OK, thanks!

Quote from: Lynn on 13 Oct 2014 03:44:55 PM
An easy way to tell if your package is relational or dimensional is to[...]If so then it is dimensional and you'd also be able to see a meta data tree and a member tree in the source pane.

Yep, it is Dimensional. Although i'm not quite seeing how this would affect this approach.

Quote from: Lynn on 13 Oct 2014 03:44:55 PM
There isn't enough information to help you solve the cross join problem...[...] but it shouldn't be anything inherent in the technique I provided.

Alright, then its unrelated. Thanks for the pointer.

Now a bit back on track:

Quotecount( [Product type] for [Product color], [Product line] )
#/* This counts the number of cities (type) for each country (line) and size (color) combination */#

This item, when I put it in the base query and added it to a ungrouped table, it actually returned a repeated total number of rows for me. If I understood the logic, and please correct me if im wrong, it should had actually counted number of unique(?) values for everything after "for" (which act like sort of a grouping)?

Also, the Item you created in the base-query "MaxColorCount", is this used by anything? Can't see it beeing used anywhere.

Lynn · 13 Oct 2014 04:23:11 PM

Oh dear. My solution was specifically for a relational package. It would be different for dimensional. I will look at it when I have time. Maybe someone else will chime in also.

mixim · 13 Oct 2014 04:34:19 PM

Quote from: Lynn on 13 Oct 2014 04:23:11 PM
Oh dear...

Maybe all hope is not lost! (Or do you know for sure it wouldnt work in dimensional?)
Right now, i have placed the items "ColorCount" and "MaxColorCount" in a table that is grouped by "Country".

ColorCount displays:
1
2
8
51

I guess this means it identified 4 different Colours and counted the occurence of each.

MaxColorCount displays:
51
51
51
51

I guess this looked at all of the ColorCount values and selected the highest one.
So what is missing for me now is to display the name of this Colour, instead of its count?

Anybody with a suggestion of how to do that, without the reference-query technique. I experimented with a Join instead of reference query but was not really successful. Could this maybe be done with just a nested expression on an item?

Bonus question:

BSP Software	Resources	About Us
MetaManager	BSP Software Training	BSP Software
Integrated Control Suite	YouTube Channel	Micro Strategies Inc
Security Migration		IBM Cognos
Integrated Management Suite

If you are unable to create a new account, please email support@bspsoftware.com

COGNOiSe.com - The IBM Cognos Community

News:

Most Frequent Value (Mode) as Aggregation?

mixim

mixim

Lynn

mixim

Lynn

mixim

Lynn

mixim