Semi additive Fact

ratnalein · 06 Dec 2012 09:02:48 AM

Hallo everyone,

I am facing a problem with semi additive fact:

Fact table:


Subject    NumberOfStudents

math        3
physic      4
music       2

Dimension table 1:

Code Select


SubjectID      SubjectType

math           technical
physic         technical
music          non-technical

Dimension table 2:

Code Select


SubjectTypeID      SchoolType

technical          School_A
non-technical      School_B

The idea for this model is to count the number of students in several granularities, for example:
1. Number of Students, which are studying math
2. ...
3. Number of Students, which are studying in School_A
4. ...

I model the tables in a snow flake scheme:

FactTable <--- DimensionTable1 <--- DimensionTable2

My problem:
Since the fact in the fact table is calculated per distinct (Student), I can not just aggregate using additive aggregattion the fact for the dimensional table 2 and upwards. Because, for example --> Ratna is studying math and physic, so the result of "List me the number of Students which is studying math" --> 1. This is still correct. But if I list the number of students "which is studying at School_A", then the result will be 2. :-\ This is false, because actually there is only 1 student studying in School_A (Ratna).

I hope, my problem can be understood. My question: is it possible to get this problem solved in the cognos framework manager? If yes, would you please give me some hints which adjustments must be done?

Thank you very much.

Regards,

Ratna :)

blom0344 · 06 Dec 2012 09:25:51 AM

The granularity of the fact is wrong in this case. It is a summary table causing problems.
[Fact table measures should contain only additive measures in the first place; this goes for summary tables too]
A fact table storing the student ID would make more sense , since performing a distinct count against the Id should work for all scenario's

ratnalein · 06 Dec 2012 09:47:55 AM

Hallo blom0344,

firstable, thank you for your quick reply. Mhh.. you mean, a fact table storing the student ID like this?

Code Select


Student   Subject 

Ratna      math
Ratna      physic

The problem is, I have billions of Students :(, so it will not be efficient to build the fact table like this. Suppose that the fact table MUST be as previous, ist there any best practice to deal with?

Thank you.

Regards,
Ratna

blom0344 · 06 Dec 2012 09:59:23 AM

Ratna, your current table stories summaries which are the problem . It is not a true fact table as the measures themselves are not additive. You are simply losing information the way you build the fact.

Obviously storing the student name instead of storing the student key will create another set of problems with students that have identical names. Are you afraid that a larger fact table will be inefficient? It is basically you an event table detailing which combinations of students and subjects exists. Surely this should not be a problem, cause both columns (especially subject) would make ideal indices

Bark · 06 Dec 2012 10:20:30 AM

I faced the same problem recently and there is no "easy" solution. I tried 3 approaches and both worked although with huge disadvantages.

1. A model with StudentID (as proposed before) and then count distincts for the measure.
2. A Transformer cube with a Category Count measure having a dimension called "student". Because the limitation of 65000 childs for a parent, I had to create dummy levels by dividing the ID by 100, 10000, 1000000 etc.
3. A cube with external roll up and create a view for all the possible scenarios (it gets unmanageable with more than 3 dimensions of 2/3 levels each) and although the generation won't be fast (loads of counts distincts) the cube will perform extreamly quicker thant the DB when reporting against it.

Let me know if any of this suits you and I will try to expand.

Regards,

Bark

cognostechie · 06 Dec 2012 01:04:25 PM

A category count cube is ideal for semi-additive and non-additive measures.

The only problem is Transformer has an upper limit on 2 million categories so if it reaches that then you will have to shoot for an external rollup cube which is also a supported feature.

This can also be done in FM by building a Fact table like this:

Student Subject Subject Type School
----------------------------------------------------
Ratna Maths Technical A
Ratna Physics Technical B
John Music Non-Technical A
Joe Biology Technical A

Do a count distinct on the Student. I think this is what blom suggested too. Having detailed data also gives you flexibility to make summarized reports if required whereas having summarized data puts limitations on making only summarized reports.

blom0344 · 06 Dec 2012 03:32:22 PM

Kinda curious..

Ratna, how many Students (Ids) are involved? :o

ratnalein · 06 Dec 2012 03:56:20 PM

Hallo Bark, bloom0344, and cognostechie,

thank you everyone for the suggestions. Regarding the suggestion from cognostechie:

Code Select


Student   Subject   Subject Type   School
----------------------------------------------------
Ratna        Maths       Technical            A
Ratna        Physics     Technical            B
John          Music        Non-Technical     A
Joe            Biology     Technical             A

Do a count distinct on the Student.

I have a question regarding "do a count distinct on the Student". Do u you mean, in the framework manager Student should be set as Fact and should get the count distinct as aggregate function?

Regards,

Ratna

@bloom0344, I have around 650 millions students/persons.

cognostechie · 06 Dec 2012 04:50:32 PM

Ratna -

The data I used was only for explaining it. You will need to use some kind of ID like Student ID which is numeric. If it is alphanumeric then create another column in the database table and convert it to numeric. Mark that as a Fact in the database layer and set the Regular Aggregate and Semi-aggregate property to Sum. In the Business Layer, keep it as a Fact but mark the Regular Aggregate property to Count Distinct and Semi-aggregate property to Sum.

I have the same situation and I converted the alpha numeric ID to numeric. So in my situation the ID was A1299087. I kept
it as is but created another column in the database table called ID_Numeric and used this expression:

convert(int,ltrim(replace(Customer_ID,'A','1')))

so this column will have values like 11299087 and then bring in this column in FM as a Fact.

are there 650 million students or 650 million rows of data?

ratnalein · 06 Dec 2012 05:01:06 PM

Hallo cognostechie,

okay, I got your point now.

There are 650 millions of rows, that is the problem. This is a storage problems, I dont know how may Gigs I will have to spend for all these rows. I will have to do compromisses. I will try. The problem is, the rows are getting more and more.. it is not just 650 millions.

Regards,

Ratna

cognostechie · 06 Dec 2012 06:00:31 PM

In that case, you can create multiple tables instead of one which will require less storage

Table: Student_Dim (Dimension)

Student_ID Name
---------------------------------
1001 Ratna
1002 John

Table: Student_Fact (Fact Table)

Student_ID Subject_ID School_ID
---------------------------------------------------
1001 1 1
1001 2 1
1001 3 2
1002 2 1

Table: Subject (Dimension)

Subject_ID Subject Subject_Type
------------------------------------------------------------------
1 Maths Technical
2 Physics Technical
3 Music Non-Technical

Table: School (Dimension)

School_ID School
-------------------------------------
1 School A
2 School B

This would reduce the size of the Fact table because the subject description, subject type and school name will not repeat in the Fact table. In FM, you can join the Fact to all the Dimensions. The count distinct on Student Id will work perfectly with the joins.

blom0344 · 07 Dec 2012 02:55:03 AM

As Cognostechie describes facttables should store identifiers , not names. It will also make joins a lot faster especially with proper indexing. But I still do not get the 650 million rows. What are you storing in the fact? Just the relationship between student and subject (class) or are you keeping track of changes or storing daily snapshots?

650.000.000 seems like a pretty large number..

ratnalein · 07 Dec 2012 03:23:54 AM

Hallo blom0344,

what do you think about this constellation:

Table: Student_Fact (Fact Table)

Student_ID Subject_ID
-------------------------------
1001 1
1001 2
1001 3
1002 2

Table: Subject (Dimension)

Subject_ID Subject School_Type
-------------------------------------------
1 math technical
2 physic technical
3 music non-technical

Table: School_Type (Dimension)

School_Type_ID School_Type
------------------------------------------
1 technical
2 non-technical

This is pretty much my end idea. I will just declare Student_ID in my fact table as count distinct in the Framework Manager :-*. So, that must be okay?

Thank you.

Regards,
Ratna

PS. Yes, it is a large number, because It deals with students all over the world

blom0344 · 07 Dec 2012 03:38:23 AM

Once more :o You need to give some more details. Is this a real-world case or are you modelling a study example? The way you descibe the fact it would store a single point-in-time relationship between entities. With these huge numbers involved every day you will lose students through mortality, people dropping out, but also large numbers will start. Courses will be dropped, some of them perhaps only locally.

From the little info you offer my guess this is an accumulating snapshot fact table, hence the amount of records. If the snapshot is on a daily basis, then I can understand your problems with the fast growing fact table.. (even with a limited amount of students)

So, what are we dealing with?

1. Hoax
2. Study example
3. A real world situation

cognostechie · 08 Dec 2012 05:45:38 PM

.. and me spending that much time on it ! >:(

ratnalein · 09 Dec 2012 08:47:20 AM

Hallo blom0344 and cognostechie,

sorry for the late reply.

I have decided to realize 2 variants:
1. the variant with Student_Id as measure in the fact table, as suggested from cognostechie. I will set the usage of Student_Id with count distinct. It will show me how long a report will take.
2. the variant offering the reports author just to aggreate base on the dimension School_Type. This variant will hopefully not end in the performance issue. This might be the wisest solution.

Actually the case is already pretty much detailed. In order to simplify the business logic, I have changed the subject into "Student, Subject, School". The real business logic deals with single persons to, which get flags. "A person studies = A person gets flag". And surely it is a real world situation. At the same time, I am still learning dealing with various fact table variations. And I havent had this issue (distinct Student_ID). That is why I asked the forum.

I hope, I havent done anything wrong?

Regarding the fast growing table, it is from nature just logical, because a person flags not structured in separate columns. That is why the table is growing. :-*

Thank you very much cognostechie and blom0344.

Regards,
Ratna

blom0344 · 09 Dec 2012 12:21:59 PM

Quote from: ratnalein on 09 Dec 2012 08:47:20 AM

Regarding the fast growing table, it is from nature just logical, because a person flags not structured in separate columns. That is why the table is growing. :-*

What is this supposed to mean?

ratnalein · 10 Dec 2012 04:32:08 AM

Hallo blom0344,

Code Select


Regarding the fast growing table, it is from nature just logical, because a person flags not structured in separate columns.  That is why the table is growing.

I meant:

Person_ID Person Spending_Type Spending_Date

0001 Ratna Money 01.01.2012
0001 Ratna Money 12.09.2012
...
...

Regards,
Ratna

BSP Software	Resources	About Us
MetaManager	BSP Software Training	BSP Software
Integrated Control Suite	YouTube Channel	Micro Strategies Inc
Security Migration		IBM Cognos
Integrated Management Suite

If you are unable to create a new account, please email support@bspsoftware.com

COGNOiSe.com - The IBM Cognos Community

News:

Semi additive Fact

ratnalein

blom0344

ratnalein

blom0344

Bark

cognostechie

blom0344

ratnalein

cognostechie

ratnalein

cognostechie

blom0344

ratnalein

blom0344

cognostechie

ratnalein

blom0344

ratnalein