If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Semi additive Fact

Started by ratnalein, 06 Dec 2012 09:02:48 AM

Previous topic - Next topic

ratnalein

Hallo everyone,

I am facing a problem with semi additive fact:

Fact table:


Subject    NumberOfStudents

math        3
physic      4
music       2



Dimension table 1:

SubjectID      SubjectType

math           technical
physic         technical
music          non-technical


Dimension table 2:

SubjectTypeID      SchoolType

technical          School_A
non-technical      School_B


The idea for this model is to count the number of students in several granularities, for example:
1. Number of Students, which are studying math
2. ...
3. Number of Students, which are studying in School_A
4. ... 

I model the tables in a snow flake scheme:

FactTable <--- DimensionTable1 <--- DimensionTable2

My problem:
Since the fact in the fact table is calculated per distinct (Student), I can not just aggregate using additive aggregattion the fact for the dimensional table 2 and upwards.  Because, for example --> Ratna is studying math and physic, so the result of "List me the number of Students which is studying math" --> 1.  This is still correct.  But if I list the number of students "which is studying at School_A", then the result will be 2.   :-\  This is false, because actually there is only 1 student studying in School_A (Ratna).

I hope, my problem can be understood.  My question: is it possible to get this problem solved in the cognos framework manager?  If yes, would you please give me some hints which adjustments must be done? 

Thank you very much.

Regards,

Ratna :)

blom0344

The granularity of the fact is wrong in this case. It is a summary table causing problems.
[Fact table measures should contain only additive measures in the first place; this goes for summary tables too]
A fact table storing the student ID would make more sense , since performing a distinct count against the Id should work for all scenario's

ratnalein

Hallo blom0344,

firstable, thank you for your quick reply.  Mhh.. you mean, a fact table storing the student ID like this?


Student   Subject

Ratna      math
Ratna      physic


The problem is, I have billions of Students  :(, so it will not be efficient to build the fact table like this.  Suppose that the fact table MUST be as previous, ist there any best practice to deal with?

Thank you.

Regards,
Ratna

blom0344

Ratna, your current table stories summaries which are  the problem . It is not a true fact table as the measures themselves are not additive. You are simply losing information the way you build the fact.

Obviously storing the student name instead of storing the student key will create another set of problems with students that have identical names. Are you afraid that a larger fact table will be inefficient? It is basically you an event table detailing which combinations of students and subjects exists. Surely this should not be a problem, cause both columns (especially subject) would make ideal indices

Bark

I faced the same problem recently and there is no "easy" solution. I tried 3 approaches and both worked although with huge disadvantages.

1. A model with StudentID (as proposed before) and then count distincts for the measure.
2. A Transformer cube with a Category Count measure having a dimension called "student". Because the limitation of 65000 childs for a parent, I had to create dummy levels by dividing the ID by 100, 10000, 1000000 etc.
3. A cube with external roll up and create a view for all the possible scenarios (it gets unmanageable with more than 3 dimensions of 2/3 levels each) and although the generation won't be fast (loads of counts distincts) the cube will perform extreamly quicker thant the DB when reporting against it.

Let me know if any of this suits you and I will try to expand.

Regards,

Bark

cognostechie

#5
A category count cube is ideal for semi-additive and non-additive measures.

The only problem is Transformer has an upper limit on 2 million categories so if it reaches that then you will have to shoot for an external rollup cube which is also a supported feature.

This can also be done in FM by building a Fact table like this:

Student   Subject   Subject Type   School
----------------------------------------------------
Ratna        Maths       Technical            A
Ratna        Physics     Technical            B
John          Music        Non-Technical     A
Joe            Biology     Technical             A

Do a count distinct on the Student.  I think this is what blom suggested too. Having detailed data also gives you flexibility to make summarized reports if required whereas having summarized data puts limitations on making only summarized reports.

blom0344

Kinda curious..

Ratna, how many Students (Ids) are involved?   :o

ratnalein

Hallo Bark, bloom0344, and cognostechie,

thank you everyone for the suggestions.  Regarding the suggestion from cognostechie:


Student   Subject   Subject Type   School
----------------------------------------------------
Ratna        Maths       Technical            A
Ratna        Physics     Technical            B
John          Music        Non-Technical     A
Joe            Biology     Technical             A

Do a count distinct on the Student.


I have a question regarding "do a count distinct on the Student".  Do u you mean, in the framework manager Student should be set as Fact and should get the count distinct as aggregate function?

Regards,

Ratna

@bloom0344, I have around 650 millions students/persons.

cognostechie

Ratna -

The data I used was only for explaining it. You will need to use some kind of ID like Student ID which is numeric. If it is alphanumeric then create another column in the database table and convert it to numeric. Mark that as a Fact in the database layer and set the Regular Aggregate and Semi-aggregate property to Sum. In the Business Layer, keep it as a Fact but mark the Regular Aggregate property to Count Distinct and Semi-aggregate property to Sum.

I have the same situation and I converted the alpha numeric ID to numeric. So in my situation the ID was A1299087. I kept
it as is but created another column in the database table called ID_Numeric and used this expression:

convert(int,ltrim(replace(Customer_ID,'A','1')))

so this column will have values like 11299087 and then bring in this column in FM as a Fact.

are there 650 million students or 650 million rows of data?

ratnalein

Hallo cognostechie,

okay, I got your point now. 

There are 650 millions of rows, that is the problem.  This is a storage problems, I dont know how may Gigs I will have to spend for all these rows.  I will have to do compromisses.  I will try.  The problem is, the rows are getting more and more.. it is not just 650 millions.

Regards,

Ratna

cognostechie

#10
In that case, you can create multiple tables instead of one which will  require less storage

Table:  Student_Dim (Dimension)

Student_ID           Name       
---------------------------------
1001                      Ratna
1002                      John         

Table: Student_Fact (Fact Table)

Student_ID     Subject_ID      School_ID
---------------------------------------------------
1001                  1                         1
1001                  2                         1
1001                  3                         2
1002                  2                         1

Table: Subject (Dimension)

Subject_ID              Subject            Subject_Type     
------------------------------------------------------------------
1                              Maths               Technical             
2                              Physics             Technical         
3                              Music                Non-Technical   

Table: School (Dimension)

School_ID                     School
-------------------------------------
1                                  School A
2                                  School B

This would reduce the size of the Fact table because the subject description, subject type and school name will not repeat in the Fact table. In FM, you can join the Fact to all the Dimensions. The count distinct on Student Id will work perfectly with the joins.

blom0344

As Cognostechie describes facttables should store identifiers , not names. It will also make joins a lot faster especially with proper indexing.  But I still do not get the 650 million rows. What are you storing in the fact? Just the relationship between student and subject (class) or are you keeping track of changes or storing daily snapshots?

650.000.000 seems like a pretty large number..

ratnalein

Hallo blom0344,

what do you think about this constellation:

Table: Student_Fact (Fact Table)

Student_ID    Subject_ID
-------------------------------
1001              1
1001              2
1001              3
1002              2


Table: Subject (Dimension)

Subject_ID  Subject   School_Type
-------------------------------------------
1                 math       technical
2                 physic     technical
3                 music      non-technical


Table: School_Type (Dimension)

School_Type_ID        School_Type
------------------------------------------
1                               technical
2                               non-technical


This is pretty much my end idea.  I will just declare Student_ID in my fact table as count distinct in the Framework Manager   :-*.  So, that must be okay? 

Thank you.

Regards,
Ratna

PS.  Yes, it is a large number, because It deals with students all over the world

blom0344

Once more   :o You need to give some more details.  Is this a real-world case or are you modelling a study example?  The way you descibe the fact it would store a single point-in-time relationship between entities.  With these huge numbers involved every day you will lose students through mortality, people dropping out, but also large numbers will start. Courses will be dropped, some of them perhaps only locally.

From the little info you offer my guess this is an accumulating snapshot fact table, hence the amount of records. If the snapshot is on a daily basis, then I can understand your problems with the fast growing fact table.. (even with a limited amount of students)

So, what are we dealing with?

1. Hoax
2. Study example
3. A real world situation

cognostechie

.. and me spending that much time on it !  >:(

ratnalein

Hallo blom0344 and cognostechie,

sorry for the late reply. 

I have decided to realize 2 variants:
1. the variant with Student_Id as measure in the fact table, as suggested from cognostechie.  I will set the usage of Student_Id with count distinct.  It will show me how long a report will take.
2. the variant offering the reports author just to aggreate base on the dimension School_Type.  This variant will hopefully not end in the performance issue.  This might be the wisest solution.

Actually the case is already pretty much detailed.  In order to simplify the business logic, I have changed the subject into "Student, Subject, School".  The real business logic deals with single persons to, which get flags.  "A person studies = A person gets flag".  And surely it is a real world situation.  At the same time, I am still learning dealing with various fact table variations.  And I havent had this issue (distinct Student_ID).  That is why I asked the forum. 

I hope, I havent done anything wrong?

Regarding the fast growing table, it is from nature just logical, because a person flags not structured in separate columns.  That is why the table is growing.   :-*

Thank you very much cognostechie and blom0344.

Regards,
Ratna





blom0344

Quote from: ratnalein on 09 Dec 2012 08:47:20 AM

Regarding the fast growing table, it is from nature just logical, because a person flags not structured in separate columns.  That is why the table is growing.   :-*

What is this supposed to mean?

ratnalein

Hallo blom0344,


Regarding the fast growing table, it is from nature just logical, because a person flags not structured in separate columns.  That is why the table is growing. 


I meant:

Person_ID     Person      Spending_Type     Spending_Date

0001             Ratna        Money                    01.01.2012
0001             Ratna        Money                    12.09.2012
...
...


Regards,
Ratna