If you are unable to create a new account, please email support@bspsoftware.com

 

News:

MetaManager - Administrative Tools for IBM Cognos
Pricing starting at $2,100
Download Now    Learn More

Main Menu

Split CSV into multiple lines

Started by msawyer, 15 Nov 2013 12:05:52 PM

Previous topic - Next topic

msawyer

I am working on a report in Report Studio and need to split a field where the values are separated by a comma (ex: S,M,L,XL).  I need to split these values into their own separate lines.  I also have other fields in the report. 

Current data looks like this:

Style#     Color     Sizes
123456    123       S,M,L,XL

I need my data to look like this:

Style#     Color     Sizes
123456    123       S
123456    123       M
123456    123       L
123456    123       XL

bdbits

If your data is dimensional, or even in a star schema, the best answer is to have your ETL developer or package modeler split that into a dimension, or more likely a level in a dimension, for you. That is what it actually appears to be when I look at your example.

If it is relational, depending on the database you could probably use pass-through SQL and use a pivot table. A stored procedure could also be an option.

Cognos report expressions just don't really offer a lot in terms of parsing strings. There are functions to split strings into arrays, but I am not really sure how you could reference the individual elements of the array. If you could find a way to do that, you could create rows in the result set with a data item that represents the Nth array item, and filter out those rows that do not have an element in that position (they would be null or empty strings). In any case, if you are dealing with larger data sets the performance would likely not be very good with a lot of manipulation like this.

The best answer really is to split it out before it gets to Cognos, in my opinion.

TheCognosDave

There might be a simpler option ... what database are you reporting against ?

msawyer

I believe the database we are using is Teradata.

TheCognosDave

I think you can do this using SUBSTRING in Teradata SQL with something like this....

select substring ('S,M,L,XL' from 0 for position (',' in 'S,M,L,XL'))
,substring ('S,M,L,XL' from position(',' in 'S,M,L,XL') +1 for position (',' in 'S,M,L,XL')-1)

and so on...

Lynn

It is possible, although tedious, to parse out the individual elements as separate columns on the *same* row. When you have a variable number of items to be parsed it becomes even uglier.

Transforming one row into multiple rows gets a bit more involved and performance issues are likely to arrive on the scene.

I'm with bdbits on this one. Let the database do the heavy lifting.

MFGF

I agree. Having "multiple values" within a single data item on a row doesn't even conform to First Normal Form. Pretty much every mainstream BI reporting tool is geared up to generate SQL queries in a relational database, and expects data to conform to at least Second Normal Form. Trying to work around these kinds of data structure issues in each report (or in the metadata) is really ugly and hugely inefficient - SQL is not designed to do this easily. The absolute best way to resolve it is to normalize the data and (if you have chance) deliver it into star schemas.

Cheers!

MF.
Meep!