EAV and DMR?

CognosPaul · 13 Feb 2012 05:43:35 AM

Does anyone have any experience with modeling on an EAV database? Would a DMR model be sufficient to get around the need to pivot, and would this work with text data?

MFGF · 13 Feb 2012 07:29:00 AM

I would have to guess not. Kalido seem to have carved out a niche in creating data warehouses to resolve reporting off this kind of data. I would imagine pivoting and denormalisation will be required. Just a guess though, Paul. I haven't worked with this kind of data before.

Sent from my iPad using Tapatalk

blom0344 · 13 Feb 2012 09:35:38 AM

Denormalization (flattening) would not be a problem with text items. I use this extensively in DMR models. Lining up the data before applying the denormalization is usually pretty ugly. To assign data to 'buckets' you need some form of index generated on the fly (like an OLAP rank function) . SQL subjects are therefore build as inline views.
To identify dimensional hierarchies I use a little trick to store the first member with the name-value of the attribute like '(1) Region' as each customer may have their own set defined (keeping things generic)

CognosPaul · 14 Feb 2012 03:40:50 AM

At the moment we have n number of technologies. Each technology shares certain measures, and have their own sets of measures. Each technology has their own table, but the report requirements mean that the tables are unioned. The reports then allow the user to perform a filter: [Technology] in ?Technology? In terms of performance, things aren't so great. The queries do table scans against each table, even when they're not being called in the query.

The tables have records being inserted at a rate of about 5k/sec, depending on the customers installation and user base. Historic data (older than 3 days) are deleted once a day. At a rough estimate, each fact table should have around 1.3b rows. About three times a year a new technology is added, necessitating modifications to the framework and reports. Existing customers who don't have the new technology become out of sync making support much difficult.

The idea is to have an EAV table set up like:
technology, timestamp, junk key, metric key, metric value.

I was hoping partitioning on technology with index on timestamp and metric key should solve most of the performance problems (not a DBA, so I'm probably talking out of my ass). The metric value would be mostly additive values, but may occasionally be a non measure like phone number or ip address. How could those cases be handled? And would filters against those be efficient?

The metric dim would be the source of the DMR and be something like:
technology, metric lvl1, lvl2, lvl3, key

I'm figuring the DMR would make pivoting easier by putting the metric level in the columns of a crosstab.

An EAV table would probably end up having astronomical number of rows. 1.3b * 12 technologies * (avg n metrics per technology), would this cause problems?

blom0344 · 15 Feb 2012 02:37:37 AM

Would you not have a massive concurrency issue with 5k rows inserted each second? On DB2 I used to work with bitmap indexes for low cardinality cases. This worked pretty good for a large static set of tables. However, in your case I would expect a large performance drop regarding inserts as the index needs to keep up. Partitioning would then be the logical choice.
Are analyses performed against aggregated results? I would also expect a heavy penalty when having to go through these massive tables to condense data.
What RDBMS are you working with?

CognosPaul · 15 Feb 2012 08:14:09 AM

Apparently not, the DBAs are very confident in the database. They use all sorts of tricks to load the data, I'm not entirely sure since I work primarily with the BI team. Performance is one of the biggest problems. Most reports work within 30 seconds, but some major ones can take upwards of an hour to run.

It's Oracle, but they're migrating to Greenplum.

They have aggregated tables for per second/5 minutes/15 minutes/1 hour and day.

cognostechie · 15 Feb 2012 11:52:23 AM

I would love to know how this goes.

We did a project for a major food company that had about 10 billion rows for sales of every product line so we divided the data into 20 different schemas , one for each product line which was then brought into FM seperately and then marged together in the Model Layer (prior to the Business Layer) with the option to work with the Prompts. When the report was run, the user could select the Product Line and it would go to the releated schema. The DMR was for all product lines but then we ran into performance and we used a software called Loadrunner which balances out some of the load. The issue was still there but relatively less.

I would have gone for TM1 cube with the option to use Relational data for detailed reporting via drill thru at the lowest level. Not sure if you have that option.

BSP Software	Resources	About Us
MetaManager	BSP Software Training	BSP Software
Integrated Control Suite	YouTube Channel	Micro Strategies Inc
Security Migration		IBM Cognos
Integrated Management Suite

If you are unable to create a new account, please email support@bspsoftware.com

COGNOiSe.com - The IBM Cognos Community

News:

EAV and DMR?

CognosPaul

MFGF

blom0344

CognosPaul

blom0344

CognosPaul

cognostechie