If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Relational VS OLAP

Started by cognos05, 24 Aug 2016 03:35:11 PM

Previous topic - Next topic

cognos05

How does an olap cube build and how is it different with relational model hitting sql to bring data from FM .

I somehow feel my cognos transformer cubes are pretty much faster in retrieving data .


Thanks,
Nithya

MFGF

Quote from: nithya1224 on 24 Aug 2016 03:35:11 PM
How does an olap cube build and how is it different with relational model hitting sql to bring data from FM .

I somehow feel my cognos transformer cubes are pretty much faster in retrieving data .


Thanks,
Nithya

Hi,

"An olap cube"? Can you be more specific? It's rather like asking how powerful a car is in comparison with a motorbike and how it's different in terms of engine technology. Unless you're specific about what car and what motorbike, it's an impossible task for someone to be able to give you an answer...

MF.
Meep!

cognos05

 I have like 5 millions rows of fact data . I have like 5 dimension. When I build a cube using cognos transformer it takes like 40 mins to build and the source for all these are flat files extracted from say any ETL tool .

Its very convenient look data by cube as there are more drilling /Hierarchy capabilities .

But when the same is done VIA SQL tables , and FM as a Source , the data has to query thriugh 5 million records to retrieve any granular data selected .

My question is how would an olap engine in case of cognos transformer work and how will OLTP work when querying data and where is this data stored in the cube .

Thanks,
Nithya

cognostechie

When a Transformer cube is built, Transformer connects to those flat files to retrieve the data and then converts the data in a dimensional  format and stores that data in a flat files or a number of flat files depending on how it is defined. When you run the report, the report goes to the cube , NOT to the SQL database to fetch the data. The data which gets stored in the cube (flat files) are pre-aggregated for every scenario based on all permutation and combinations and that's the reason the reports are that fast. If your cube has only summarized level data and does not go down to the row level the it is possible to drill thru to the row level data by passing parameters to another report which would run from  a Framework model. Reports made using Transformer can connect to detailed reports made using Framework Manager.

cognos05

Thanks for the information Provided .

SO what ever may be the source for cognos transformer , while building the cube it reads all the data from the source and converts the data in dimensional format and creates a cube which will have pre aggregated data stored at different levels and saved  in the form of flat files inside the cube . SO when I run the report it automatically slices by the preaggregated data from the cube files .

Whereas in Framework manager it has to hit the sql to bring the data and iterate x million rows based on the query members given . SO its mandatory that we give filters while reporting on FM ,so that data retrieved is faster and thats the reason we dont have hierarchies in FM . But even if you create a DMR it should be at summarized level right?


Thanks,
Nithya

MDXpressor

That's going to be really tough to manage in FM alone.  I've never loved the use of FM for DMR.  Here's my take as someone on both sides of the fence (I'm a Cognos consultant with a focus on authoring, but I also build FM & cubes).

Performance of a cube is not related to the number of fact records that serve as the base for the cube.  Instead, a cube's performance is based on the number of possible intersections of dimensional members (Canada, USA, Ontario, New York State, Toronto, Los Angeles, ... would all be members of the Geography dimension).  So as to say, the more detail, the bigger and slower your cube will be.   When building the cube, it identifies each intersection, queries the source dataset (flat files, db, ...) to determine the total, and then stores only the summarized value in the intersection.  When you query the cube for a specific value, it goes to that intersection and retrieves a single value.  Think about it, retrieving 1 record is going to be fast.  You increase the size of the cube (and slow it's performance) by increasing the number of intersections (add new dimensions, or new members to existing dimensions).  With 100 rows of fact data or 10 Billion, it won't affect our cube size (much).  It will, however, drastically affect your cube build time.

Before I begin this bit of arithmetic, this is a very simplified version of how to estimate your cube size, there is an actual algorithm to estimate cube size for Transformer, but this will hopefully be a decent proxy.  And it only serves to help you understand what you'd be undertaking if you were to try to replicate the functionality of cube in some sort of materialized view within your database. Let's imagine we have 3 dimensions, each with 10 members and a fact table with 1 measure.  Our cube will contain 10 x 10 x 10 x 1 (1000) summarizations; or in cube terms, intersections.   There is very little chance you're going to take the time to work out each of those summarizations in your DB.  So let me get onto my pulpit as an advocate for consumers.

As you are learning, FM doesn't pre-calculate anything.  Every time your user needs a fact, they are going to be hammering your fact table.  All fine if you've summarized the crap out of your database, but unlikely, given that you're probably cutting corners already if you're using DMR.  And if you are thinking of going to the effort of Summarizing, then I'd ask, why do that when you could just build a cube?   

The time it takes to build a cube vs. the time it takes to query a cube * the number of queries per day is the balance that BI providers struggle to face.  In what I've seen, there is an ugly trend happening where IT is absolving themselves of the cube build because it's slow, and sometimes laborious.  Often times, at first glance, it's also a budget-saver for IT.  However, unrecognized is the fact that circumventing the cube build has put their additional pressure on their DB, has not mitigated the hours to build the model, has not mitigated the hours to build the summarizations, and probably have put their user-adoption at risk due to slow response times.  Users can't afford a 10 minute  (or even 10 second) response time for data.  And what about the poor report author responsible for creating your user content?  If that person is in IT, then they haven't even shed hours when you account for how long even minor changes may take, if you are testing as you author.  You're going to make them wait 10 seconds every time they adjust and test their queries (or even their formatting)?  That makes me want to stab my eyes out with knitting needles.

I call it the 'burden of latency'.  Who is going to own the time it takes to get data?  Is IT going to knuckle under and build a cube (presumably at night), or is the user going to pay the toll every time they have to query their data?  Don't get me wrong, Transformer has it's limitations, but as a person who has sold, built, authored, and consumed with both Transformer cubes and DMR, I just don't see why you'd go to the effort of DMR.  It has a very high risk of being a net loss to the company.

Rant over.  *Mic drop*
No, a proof is a proof. What kind of a proof? It's a proof. A proof is a proof, and when you have a good proof, it's because it's proven.

-Jean Chretien

cognostechie

In addition to what you mentioned, there are 2 more reasons why IT does not like to build cubes and advocates the use of a DMR:

1> There is a data transfer involved in a cube and most developers can't get the data accurate in the cube.
2> Most developers don't like to idea of a solution which would make the life easy for the business because it puts their own jobs in danger. Providing a cumbersome and expensive solution to maintain that does not work without the intervention of IT ensures their jobs and actually creates more jobs for developers because a bad solution needs more people to maintain it. A cube would also reduce the need for reports from It as the business would be able to make their reports easily.

bdbits

It is kind of an edge case, but another reason for using DMR could be wanting dimensional functionality against frequently refreshed or even near-realtime data. While not terribly common, I have seen near-realtime requirements in more than one case. In such cases, needing to rebuild a cube may not be workable.

Naturally, it all depends on requirements. I am a big fan of cubes when appropriate, but they are not ideal in every situation.  ;D

MDXpressor

Quote from: bdbits on 31 Aug 2016 10:27:53 AM
It is kind of an edge case, but another reason for using DMR could be wanting dimensional functionality against frequently refreshed or even near-realtime data. While not terribly common, I have seen near-realtime requirements in more than one case. In such cases, needing to rebuild a cube may not be workable.
Without having a technical discussion on the exact requirement, let's discuss the need for 'Real-Time' data for analysis, and why it's not usually relevant to a cube structure.

When I think of the scale of data that is received 'Real-Time' in comparison to that collected over the months and years it takes to build a statistically viable data set, the Real-Time data is negligible.  So, in an application that collects data over the course of months and years, the Real-Time values are lost is sea of data.  As a balance to that, the older the data is, the less relevant it should be to your current business strategies.  Often what worked 20 years ago is not relevant to today's decisions.  I can see real-time being relevant immediately after an online product launch,especially around click-streams.  I'm envisioning a bunch of developers huddled around a single monitor on the morning of their launch.  However, I'm struggling to think of another good example, maybe for monitoring something as fluid as stockprices?

Near Real-Time though becomes a tough sell to me.  How 'near' is 'near' enough?  Incremental builds on your cubes should reduce time to deployment to meet the needs of most 'Near Real-Time'.  Since some waiting is acceptable, it's just a negotiation for how much you need to build a proper cube.

Quote from: bdbits on 31 Aug 2016 10:27:53 AM

Naturally, it all depends on requirements. I am a big fan of cubes when appropriate, but they are not ideal in every situation.  ;D

Everything else said, totally agree that situations exist where DMR could be a good option, but I hope we have clearly identified them as a RARE need.  With all due respect to DMR, of course, I think that lack of commitment to a good process creates immeasurable waste.  Too often it is the strategy employed.
No, a proof is a proof. What kind of a proof? It's a proof. A proof is a proof, and when you have a good proof, it's because it's proven.

-Jean Chretien

cognostechie

Completely agree !

Real-time and near real-time also depends on how often the DW is refreshed. If the DW can be refreshed multiple times a day then why can't the cube be? With the newer versions , the cube build time has reduced substantially.

In most cases, what happened in last hour is irrelevant to analytics though in some situations it might be.

My personal belief is that the justification to use a DMR , in most cases, is only because it is easier to make compared to a cube and the lack of developer's technical knowledge to build a cube.

bdbits

Agreed, it is rare and as I said, an edge case, but near-real-time data requirements do exist, mostly in a monitoring type of situation as mentioned.

First, you can do ETL in near-real-time. It's not cheap but you can do it. In such a case "refreshing" data is irrelevant. The data is there as soon as it is created in the source system.

A sample use case of which I have personal knowledge was for a heavy construction company. They built, among other things, roads of all kinds. There are a number of critical constraints and success factors that need to be tracked while a large-scale pour or paving is taking place, and it can involve multiple applications, facilities and hundreds of pieces of equipment and associated material much of which has time-constrained value. So they built in-house systems that consolidated data from multiple operational applications to provide a current and comprehensive high-level view of progress (think 'dashboard'). It was not using cubes or even Cognos (for the curious, it was on Oracle using XStreams), but is an example of a need for analytical reporting/dashboarding with requirements for immediate data. You don't always need months or years for a collection of data to have value, though they did retain the data for post and long-term analysis when looking at future projects.

I was not personally involved but know well one of the two main people responsible for this system and based on what it cost the company, you can bet it was valuable to them.

Would I put a DMR package over this warehouse? Personally, I probably would not. But I can see the possibility that someone might do it for a similar system. As I said, edge case.

cognostechie

I am familiar with Oracle streams. It is an alternative to CDC  (change-data-capture). The Oracle DB stores every change (additions of rows/deletion etc) in log files and those log files are used to upload data into another DB which gets updated with the changes. Another technology is on-line backup to another DB which will be instant with the addition of rows to the source system.

However, none of this will send the data to the DW because with the changes updated, even if instantly, the ETL process has to run to transform the data into Star and then store that in the DWH.
With that, a DMR will become dependant on the refresh process too and won't be real-time. One case I saw is that FM was connected to the source system DB. That was completely real-time data but then that company was very small using a very old source system. With a system of good  scale, there are licensing issues for connecting to the source system and FM won't actually work with such a normalized structure.