If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

EAV and DMR?

Started by CognosPaul, 13 Feb 2012 05:43:35 AM

Previous topic - Next topic

CognosPaul

Does anyone have any experience with modeling on an EAV database? Would a DMR model be sufficient to get around the need to pivot, and would this work with text data?

MFGF

I would have to guess not. Kalido seem to have carved out a niche in creating data warehouses to resolve reporting off this kind of data. I would imagine pivoting and denormalisation will be required. Just a guess though, Paul. I haven't worked with this kind of data before.


Sent from my iPad using Tapatalk
Meep!

blom0344

Denormalization (flattening) would not be a problem with text items. I use this extensively in DMR models. Lining up the data before applying the denormalization is usually pretty ugly. To assign data to 'buckets' you need some form of index generated on the fly (like an OLAP rank function) . SQL subjects are therefore build as inline views.
To identify dimensional hierarchies I use a little trick to store the first member with the name-value of the attribute like  '(1) Region' as each customer may have their own set defined  (keeping things generic)

CognosPaul

At the moment we have n number of technologies. Each technology shares certain measures, and have their own sets of measures. Each technology has their own table, but the report requirements mean that the tables are unioned. The reports then allow the user to perform a filter: [Technology] in ?Technology? In terms of performance, things aren't so great. The queries do table scans against each table, even when they're not being called in the query.

The tables have records being inserted at a rate of about 5k/sec, depending on the customers installation and user base. Historic data (older than 3 days) are deleted once a day. At a rough estimate, each fact table should have around 1.3b rows. About three times a year a new technology is added, necessitating modifications to the framework and reports. Existing customers who don't have the new technology become out of sync making support much difficult.

The idea is to have an EAV table set up like:
technology, timestamp, junk key, metric key, metric value.

I was hoping partitioning on technology with index on timestamp and metric key should solve most of the performance problems (not a DBA, so I'm probably talking out of my ass). The metric value would be mostly additive values, but may occasionally be a non measure like phone number or ip address. How could those cases be handled? And would filters against those be efficient?

The metric dim would be the source of the DMR and be something like:
technology, metric lvl1, lvl2, lvl3, key

I'm figuring the DMR would make pivoting easier by putting the metric level in the columns of a crosstab.

An EAV table would probably end up having astronomical number of rows. 1.3b * 12 technologies * (avg n metrics per technology), would this cause problems?

blom0344

Would you not have a massive concurrency issue with 5k rows inserted each second? On DB2 I used to work with bitmap indexes for low cardinality cases. This worked pretty good for a large static set of tables. However, in your case I would expect a large performance drop regarding inserts as the index needs to keep up. Partitioning would then be the logical choice.
Are analyses performed against aggregated results? I would also expect a heavy penalty when having to go through these massive tables to condense data.
What RDBMS are you working with?

CognosPaul

Apparently not, the DBAs are very confident in the database. They use all sorts of tricks to load the data, I'm not entirely sure since I work primarily with the BI team. Performance is one of the biggest problems. Most reports work within 30 seconds, but some major ones can take upwards of an hour to run.

It's Oracle, but they're migrating to Greenplum.

They have aggregated tables for per second/5 minutes/15 minutes/1 hour and day.

cognostechie

I would love to know how this goes.

We did a project for a major food company that had about 10 billion rows for sales of every product line so we divided the data into 20 different schemas , one for each product line which was then brought into FM seperately and then marged together in the Model Layer (prior to the Business Layer) with the option to work with the Prompts. When the report was run, the user could select the Product Line and it would go to the releated schema. The DMR was for all product lines but then we ran into performance and we used a software called Loadrunner which balances out some of the load. The issue was still there but relatively less.

I would have gone for TM1 cube with the option to use Relational data for detailed reporting via drill thru at the lowest level. Not sure if you have that option.