If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Cubes - when to use

Started by cogcurious, 16 Jun 2011 02:45:14 AM

Previous topic - Next topic

cogcurious

Guys,

Could someone help me to understand how to decide if reports should be built against cubes or against relational.

Is it correct to build dimensional reports just to improve performance, given cube has all the data?

Is it common to build cube on a OLTP database? What are pros and cons? I would think cube build time would be large. Anything else.

cogcurious


bdbits

I can only speak from my limited experience. I would normally recommend a cube when users want to do ad-hoc analysis and data aggregation. While a cube is generally very fast, I have had cases where it was better to do things with a relational source - primarily for very detail-level data.

You do not really want to go after an OLTP database, particularly if it is highly normalized. Cognos works best with star schemas. ETL your data into a staging database and/or a data warehouse and model your cubes on that.

cogcurious

ETL is not an option as per the client. They want real time data.

What are the disadvantages of building cube against OLTP?
This might sound naive, but could you give me an example of detail and summary data?
And, what happens if we build cube against detail data

CognosPaul

#4
Find out what they mean by real-time data. Do they want their screens refreshed the instant an order is placed? Do they expect the data to be updated every 15 minutes? Half hour?

A thorough investigation on the users' actual needs will generally show what they ask for isn't what they need, or necessarily even what they really want. Find out what type of reports will be built, who will consume them, and when they will be consumed. You can't generate COB reports in the middle of the day.

Real time reports are useful for tracking call center agents (usually refreshed every 5-10 seconds), inventory stocks (refreshed rate is in minutes), and other scenarios where actionable data is needed. Those systems are never based on the OLTP database, but rather on a secondary dedicated database.

Even if they do need "real time data" you should still not build your reporting platform on the OLTP database. I have heard horror stories, time and time again, of companies losing thousands or millions of dollars because the OLTP database crashed because of too many reports. The best example I have is a port that was using Brio to handle their reports. At first they had a small proof of concept reporting system built on their operations db. But instead of building a proper data warehouse they continued building on that POC. Eventually their DB crashed, and their entire port was shut down for 5 hours until it was brought back online. Those 5 hours cost quite a lot of money, including fines. Your clients must understand the risks associated with working with live data.

I strongly recommend reading up on real time data warehousing. There are a number of techniques to catch data without touching the operations db. But even then, you should have a daily ETL incrementally updating your warehouse with a complete warehouse refresh done on a set schedule (1st or 2nd of every month after cob).

A cube should not be built on an OLTP, if only because the process is a resource hog and might interfere with operations.

As for your original question. My general rule is that most of the reports should be based on a cube. OLAP allows for a number of reporting techniques that would be difficult or impossible on a standard relational data source. You'll still need a warehouse for the occasions where you need detailed data.

A cube is good for displaying aggregated results. For example, if you wanted a report that showed the revenue growth per quarter against the previous quarter/ or quarter of the previous year, you would need either a fairly complex SQL statement, or a very simple MDX.

Another example: Showing inventory levels. Each day has a number, but you want to show the last value for the period (3d quarter inventory level should show Sept 30 levels) while aggregating the number of sales. Difficult in SQL, but you can define the measure to do this automatically in the cube.

Detailed data is where relational dbs shine. Let's say you wanted a list of all customers and their details. Customer ID, age, sex, marital status, address, last order date, first order date, avg order size, and twenty other attributes. A cube will choke on that type of query. Cubes will attempt to cross join each hierarchy and you can very easily end up with a query attempting to return over a billion (no exaggeration) rows. A relational db will return the results quickly and easily.

A proper reporting platform takes a lot of work, but the end result is always worth it. 

MFGF

I second Paul on this.  He speaks very wise words!
Meep!

Arpitagrawal9

There are three reasons for when to use an OLAP cube in addition to your relational database – performance, drill-down functionality and availability of software tools.

Online analytical processing (OLAP) is a technique for quickly analyzing a measure, e.g. profit margin, by multiple categories or dimensions, e.g. customer, region, fiscal period and product line.  Typically the end user software has capabilities to drag categories to rows and columns and aggregate the measure at each intersection of a row and column (often called a cross tab report).  This is similar to the familiar spreadsheet format.  This numeric format can usually also be represented in the form of a chart or graph.  The real power of OLAP is the ability to drill down on a category to see more details.  For example, you might drill down on a state to see details by city.

An OLAP cube is a technology that stores data in an optimized way to provide quick response to queries by dimension and measure.  Most cubes pre-aggregate the measures by the different levels of categories in the dimensions to enable the quick response time.  End user software will make querying a cube very easy, but developers, who may be accustomed to using SQL language, will need to learn a new language – MDX (Multi-Dimensional eXpressions).

The standard design for a relational database source for this analysis is called a star schema.  A fact table is related to multiple dimensions and this can be represented graphically in a form of a star.  A star schema design will support reporting and analysis by dimensions for measures in the cross tab and graphical formats without using an OLAP cube.  Why would you go to the time, expense, disk space, skill development and increased maintenance to also build a cube when a relational database will support this analysis?

There are three reasons for adding a cube to your solution:

Performance.  A cube's structure and pre-aggregation allows it to provide very fast responses to queries that would have required reading, grouping and summarizing millions of rows of relational star-schema data.  The drilling and slicing and dicing that an analyst would want to perform to explore the data would be immediate using a cube but could have lengthy pauses when using a relational data source.
Drill down functionality.  Many reporting software tools will automatically allow drilling up and down on dimensions with the data source is an OLAP cube.  Some tools, like IBM Cognos' Dimensionally Modeled Relational model will allow you to use their product on a relational source and drill down as if it were OLAP but you would not have the performance gains you would enjoy from a cube.
Availability of software tools.  Some client software reporting tools will only use an OLAP data source for reporting.  These tools are designed for multi-dimensional analysis and use MDX behind the scenes to query the data.
OLAP cube technology will cost more in terms of development, learning and project time but will return benefits in fast response time to analyze large amounts of data.  This capability can result in insights that drive actions and decisions that enable very large organizational productivity, cost saving or revenue increasing gains.

Size of data
The size of the source data and resulting cube is of paramount importance in your choice. Cognos Powercubes have an inherent limit of 2 Gb

cogcurious

Thank you all. A few more out of the above

1)
"Real time reports are useful for tracking call center agents (usually refreshed every 5-10 seconds), inventory stocks (refreshed rate is in minutes), and other scenarios where actionable data is needed.Those systems are never based on the OLTP database, but rather on a secondary dedicated database"

By Secondary dedicated database, do you mean ODS?


2) Is it ok to have some descriptive column in a dimensional report, say "Job title description".

3) Can a dimensional report run into thousands of rows? From what you guys have mentioned above summary/dimensional report has nothing to do with number of rows rather then the level of detail in a report in terms of say columns

CognosPaul

Real time monitoring is a little outside my area of expertise. From what (very) little I understand there is software that monitors event logs and inserts rows as the events happen.

A dimensional report works by slicing numeric values inside a crosstab (lists are treated almost like crosstabs, but with measures in the columns, graphs are treated exactly like crosstabs). The number returned is the intersection of the row member, column member, measure, and any slicers. The values displayed in the rows or columns does not necessarily have to be the caption of the member. You could display one or more of the attributes of that member instead, short name, member description, category code. If the description is a lot of text, you might want to just show the title and display the description in a tooltip.

Cubes can return as many rows as you have members in your hierarchy. The problem is when you try to nest many different levels. Cubes will create crossjoins between the hierarchies, easily resulting in billions of rows. I recommend reading Writing Efficient OLAP Queries.

cogcurious

Another, If i am reporting at the lowest levels(months,city) in the dimensions, would my report be slower than if the report is that on the highest level(year, country etc) in the cube?

CognosPaul

Yes, if only because of the additional time needed to pull the data from the cube and render it on the page.

cogcurious

Why does it need more time? Is it because of where the data is placed in the cube? Just wondering about the exact reason.

CognosPaul

The extra delay is a simple factor of increased data throughput. Variables could depend on how fast your network is, hard disk speed, and how many months/cities vs years/countries you have in your cube.

A simple report that shows 5 countries by 10 years might take 100 milliseconds to process plus 50 milliseconds to transfer from the cube to the Cognos server plus another half second to render. If you show the cities by months (500 cities * 120 months) the processing should still take only a few hundred milliseconds, but transferring the data to Cognos takes time, plus the time it takes Cognos to render all of the cells.

Practically the difference is minimal.