If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Data Modelling Question

Started by S2000, 21 Mar 2011 03:49:57 AM

Previous topic - Next topic

S2000

Sorry that I put this into the Report Studio forum, it seemed to be the least inappropriate forum.

I have a tricky situation I can't quite work out.

The fact table contains Transactions.
The dim contains Customers.
Not all customers have a transaction, but all transactions have a customer.
There is a -1/? Dim record for the few that don't have a valid customer.

I want a report that shows all customers between 18 and 25 years old and their transactions, but also include the customers that don't have transactions.

I don't want to use an outer join.
I don't want to manufacture additional fact table rows as it could interfere with other existing reports.

I find myself a bit stuck, but I'm sure I can't be the first person who needed to show dims where there were not facts.
Is there an industry approved way of modelling this?

Thanks in advance.

Lynn

Are we talking about a relational model? If so.....

If the normal use is for an inner join between customer and fact but this particular report is a "one off" situation, you can create a report with one query for customers and a second query for the facts and then outer join in the report. Essentially by-passing the FM join. Of course this also assumes that you don't require the outer join for ad-hoc Query Studio use either.

HOWEVER, this is probably not going to perform terribly well. If you have huge volumes it may not be a viable solution. You would certainly want to filter your fact table AS MUCH as possible in the base query (e.g., by date, product, or whatever) to limit the quantity of data that must be joined locally.

Another option might be model an alias so that you have a customer query subject based on the inner joined dimension for use most of the time and a second customer query subject based on an outer join to the fact for use when that requirement is needed. This would allow the database to do more of the heavy lifting compared to a report approach. A little messy and not something you'd probably want to expose to most business users.

A third approach would be to model another fact table that has a record for every customer (perhaps a summary table of some sort) and implement a multi-fact query. A multi-fact query will coalesce around a conformed dimension (customer in your case). I believe the end result would be every customer meeting your filter criteria (e.g., between 18 and 25) and the facts from your "real" transaction table as well as facts from your "other" fact table that is present just to force the coalesce.

You will want to read up on multi-fact/multi-grain modeling to ensure you've modeled correctly for this. I don't have my Kimball book handy but there may be some additional thoughts on how to best architect data for this scenario.

jive

Hi S2000,

It's basically pretty hard to show join table where information is missing from one table without doing outer join. Normally this is the perfect used for an outer join. Like Lynn said if you have I high volume of data the solution I will give you may be more time consumming then using outer join. Make a query subject based on an union between one select having customer and total transaction and the second select with all your customers and -1 as transaction value. make a select from the union result, customer and the max transaction. this way all your customer with transactions will have a positive number an the other will be the one without transaction. like
select cus_no,Max(trans_value)
FROM
(
select t1 cus_no, sum(c1) 'trans_value'
from t1 , t2
where t1.c1 = t2.c1
group by t1.c1
union
select c1, -1 'trans_value'
from t1
)
GROUP BY cus_no

this will do the job but to be honnest an outer join will be far more appropriate.

Regards Jacques

Lynn

Ah yes! Of course Jacques!! A union is an excellent option.

S2000

Thanks for the comments.

An outer join is of course the simplest solution – but I'm loathe to build an outer join into a star schema.
The outer join becomes a moot point anyhow if one of the non-existant rows in the fact is used to join to another table.

I think my only alternatives are to build a new package with extra rows for the null claims – using either unions, outer joins or some other technique.
Or,
do the work in a report  with 2 query subjects and join them on an outer join.

Thanks again.

JGirl

#5
If it is only for one report, a union or even an outer join in the report should suffice.  But if this is a common reporting requirement bigger than just one report, a possible option to consider if your data model supports it is to split (or model) your customer table as both a fact and dimension...So your dim gives you all the text attributes, and your fact gives you not much more than a count of customers/claims/policies etc.

This is quite common scenario in the insurance industry as you usually want to look at ALL your policies (customers in your example), but also understand of those how many have claims (transactions in your example).  So you usually end up with a policy dimension and policy fact, and a claim dimension and a claim fact (which links to the policy dimension).  When you only want to look at claims and the policies for the claims, you use the claims fact, and when you want to look at claims and all policies regardless of claims you use a multi-fact query.

This avoids the need for an outer join in your star schema, and can work effectively, but it is important to make sure your report authors understand how the data model works, and how multi-fact queries behave otherwise they can be just as troublesome.