If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

How do I filter the result of a full outer join? [now resolved]

Started by jackg_tor, 12 Mar 2015 04:29:12 PM

Previous topic - Next topic

jackg_tor

Hi,

I'm trying to filter a list created by a full outer join but I can't get it to work and, even more strangely, the generated SQL seems to be what I want but the displayed list is not.  Here are the details, with some simplifications such as unrelated columns omitted.

"System" table              "mad_errors" table              "ip_info" table
-------------------              ------------------------              ------------------
id                                   system_id                            system_id
name                             error                                    ipaddr
client

Each system has 0..n mad_errors and 0..n IP addresses.
Each mad_error has exactly one system.
Each ip_info has exactly one system.

My test report is just a simple list containing columns "system-name", "ipaddr" and "error" filtered by client and ipaddr.  The generated query consists of 2 clauses followed by a non-native full outer join, something like this:

with FS1 as
   (
   select
      system.name,
      ip_info.ipaddr
   from
      system LEFT OUTER JOIN ip_info ON system.id = ip_info.system_id
   where
      client = 'xx'
      [and ip_info.ipaddr like '10.102.%']          # see below for what I mean by this
    ),
with FS2 as
   (
   select
      system.name as system_name,
      mad_error.error
   from
      system LEFT OUTER JOIN mad_error ON system.id = mad_error.system_id
   where
      client = 'xx'
   ),
SELECT
    COALESCE(
        FS1.System_Name,
        FS2.System_Name) AS System_Name,
    FS1.ipaddr
    FS2.error
FROM
    FS1
        FULL OUTER JOIN FS2
        ON . . .

Originally, I had all the filters in the detailed filter section so FS1 was being filtered by IP address but FS2 was not and, of course, the full outer join combines both sets of rows.  However, when I realized that, I moved that filter to the summary filters section and that seemed to produced the correct SQL.  Here's the exact outer join that Cognos says it's using:

SELECT
    COALESCE(
        FS1.System_Name,
        FS2.System_Name) AS System_Name,
    FS1.IP_Address AS IP_Address,
    FS2.error AS error
FROM
    FS1
        FULL OUTER JOIN FS2
        ON
            (FS1.System_Name IS NOT DISTINCT FROM FS2.System_Name) AND
            FS1.sc1 = FS2.sc1
WHERE
    FS1.IP_Address LIKE '10.102.%'

I've attached the whole generated SQL in case that helps.

The output of this query shows every system that matches the client filter.  The ipaddr column is populated when it starts with 10.102 (matches that filter) and empty when it doesn't (since that row was not present in the FS1 sub-select and comes only from the outer join with FS2).

I tried converting the system <--> ipaddr relationship to 1..n so that it would use an inner join instead of left outer join for FS1 and that did change the generated query but not the displayed result.

Can anyone help me understand what's going on here and/or suggest a work-around to get the output I want?

We're running Cognos 10.2.1 with a Postgresql db.

Thanks,

Jack Goldstein

bdbits

Just a thought, but did you run the report or just click on it in Cognos Connection with a report default action set to "view most recent report"? Maybe you are seeing a cached report.

jackg_tor

Thanks for the quick response but that's not it.  I was running from within Report Studio and other changes I make are showing up.

MFGF

Quote from: jackg_tor on 12 Mar 2015 04:29:12 PM
My test report is just a simple list containing columns "system-name", "ipaddr" and "error" filtered by client and ipaddr.

Hi,

The key here is in understanding how Cognos will be treating these three tables. Firstly, the Cognos query engine is designed to query tables arranged in a star/snowflake schema and its query generation rules will deliver accurate, consistent results even when you query multiple fact tables at different levels of granularity. It identifies each table (query subject) as either a fact or a dimension, and this determination is based on the cardinalities of the relationships. In a star or snowflake schema, fact tables are always at the many ends of relationships and dimensions are not. Based on your cardinalities, System will be treated as a dimension table, mad_errors will be treated as a fact table and ip_info will be treated as a fact table. When Cognos encounters a query that spans two fact tables it automatically generates a stitch query (ie a query that includes a full outer join) - that's what you are seeing here in the Cognos SQL. If you look at the native SQL you will likely see two separate queries sent to interrogate the conformed dimension and the fact on each side. These are then stitched together locally on the Cognos server to emulate a full outer join. You would still see a stitch query here (with a full outer join), even if you changed the cardinalities to 1..n on both the mad_errors and ip_info tables, since Cognos still sees them as being fact tables (as they are at the many end of the relationships). The rules it uses to determine dimension or fact are:
If a table is at the many end of all relationships used in the query that link to it, it is a fact
If a table is at the one end of any relationship used in the query that links to it, it is a dimension

When you look at your filter logically, you're filtering on one attribute from the conformed dimension (Client in System table) and one attribute from one side of the full outer join (ipaddr from ip_info).

The filter on Client will apply to both sides of the full outer join since it's in the dimension that conforms the two. The filter on ipaddr will apply just to the ip_info side of the stitch, since ipaddr isn't something that is pertinent to the mad_errors table. From your description, that's what you are seeing?

It sounds to me like you are somehow trying to force the filter on ipaddr to also apply to the rows from mad_errors - is that the case? Logically it shouldn't (and doesn't) since it's on the "other" side of a full outer join. You might be able to fudge it, though. Each detail filter you apply has an "Application" timing option - either before auto aggregation or after auto aggregation. The default is before - and in this case the filter is usually passed to the database in the native query. If you change the timing to "After auto aggregation", this often moves the filter to the Cognos server and might mean in your case it is applied after the stitch has been performed.

The real answer, though, would be to have ipaddr as an attribute of mad_errors in the data, since that's what your request here is logically implying should be the case?

Cheers!

MF.
Meep!

jackg_tor

#4
Hi MF,

That was incredibly helpful.  Your explanation was very clear and the suggested work-around of changing to after-aggregation worked perfectly.  Thanks a lot for your help.

I don't think that changing the tables would be appropriate in this case because the primary object that we're concerned about is the "system" and the other tables are attributes of systems but with 0..n of each applying to each one.  I don't want to waste anyone's time with a question that isn't that important to me but if you or anyone else feels like carrying on the discussion, I am curious about how people usually restructure similar data to make a star schema or something close to that.

For anyone else who is reading this and wondering why the summary filter didn't work, it seems that I really misunderstood what that was for.  Here's an explanation from some guy named Anthony from businessintelligence.ittoolbox.com:

QuoteIn words, the summary filter would be used where you apply a summary aggregate to a list column rather then creating a new data item with the aggregate. So, say for example you have a report that shows employees in a department, and you applied an aggregate of count to show you how many people are in the department. The summary filter let's you use [employees] > 5 there to show you departments that have more then 5 employees rather then creating a data item as count([employees] for [Dept]) then having a detail filter as this data item > 5 So, just a way to avoid the need for a data item in some cases.

Edit:  I forgot to mention one more thing.  Even though it seems that using the summary filter should not have worked (and did not), the SQL it said was being used implied that it should have, so I think that there is a Cognos bug there.  However, I am not going to bother trying to get it fixed.

Jack Goldstein

jackg_tor

I spoke too soon when I said that the after-aggregation filter worked but I did get this working in the end, as follows:

In order to produce a query/table that it can filter, Cognos does a left outer join of the system table with the ip address table.  However, this would produce duplicate system rows and since the list it's displaying has a system name column but no IP address column, it tries (correctly) to ensure that there is only one row for each system.  It does that by somewhat arbitrarily combining all the IP addresses into a single address by using the 'MIN' function.  So, what I found is that the filter seemed to work well when filtering by the lower address but failed when filtering by the higher IP address.

I found a good proven-practices article that provides detailed explanations of a number of problems that can occur when querying on multiple fact tables and suggests a number of possible solutions.  See http://www.ibm.com/developerworks/data/library/cognos/reporting/advanced_report_design/page605.html.

The solution was to divide the main query into two:

1.  A system_info_via_ip_addr query that returns the system ids with the vulnerability info (that is linked to the ipaddr table) and filters by ip address.
2.  A system_info_via_system_id query that returns the system id with everything else (that is linked to the system table).

The two tables are explicitly inner-joined by system id.

Jack