Home Dashboard Directory Help
Search

[SSIS] TOKEN(string, tokeniser_string, occurence) function by Jamie Thomson


Status: 

Closed
 as Fixed Help for as Fixed


23
0
Sign in
to vote
Type: Suggestion
ID: 543945
Opened: 3/23/2010 7:09:41 AM
Access Restriction: Public
1
Workaround(s)
view

Description

[Previously raised here:https://connect.microsoft.com/SQLServer/feedback/details/127230/token-string-tokeniser-string-occurence-function but closed without explanation so I'm re-raising]

Taking an incoming string value and tokenising it (i.e. treating as a delimited string of smaller strings) is something we do very often in the Derived Column component. Typical use case is when someone doesn't wish to use the Flat File Connection Manager to parse a text file because the parsed coluns are not needed until a later execution tree.

However its very very hard to do the parsing in a Derived COlumn component - it basically takes a combination of FINDSTRING & SUBSTRING function calls which ends up looking very messy.
Details
Sign in to post a comment.
Posted by Robert Heinig II on 3/11/2011 at 4:51 AM
Again - why not use the .net VB/C#/* parser/expression engine? IMHO the SSIS-internal formula language shouldn't have been invented in the first place, but I digress.

Yes, the SSIS expression language is inflexible and not too powerful. One of the reasons there's so many script transforms around, no? Cliff is right about that. And needing to know the number of tokens is not a limitation of the script transform as suggested in the workaround, but of the pipeline, right? And if you see a need to work around THAT - well, either an overflow column will have to do or SSIS isn't the right tool. Oh, and the script editor is now one click less far away - and it opens at a useful default size and remembers window placement to boot. (who likes typing complex formulae into those 6-character-wide columns of the derived column transform anyway?) And Intellisense! And colour coding! Oops, sorry.
Posted by Microsoft on 7/23/2010 at 5:06 PM
Thank you for your feedback. We believe that changes being made in an upcoming release will alleviate the concern you expressed; and therefore, we are closing this issue. As always, you may request that this issue be reopened or that new issues be filed at any time. Please continue to provide us with feedback. Your support is essential to our success.

-- The SSIS Team
Posted by Cliff Buckley on 3/31/2010 at 12:01 PM
An added use case for this, if you have a column of data that is overloaded.
For example, you get a file with 10 columns. The file is tab delimited. Column number 2 is a subset of data containing pipe delimited data. Having a tokenizer would be a nice way to handle the overloaded column.
Posted by Jamie Thomson on 3/23/2010 at 10:39 AM
I like Todd's suggestion of a function to know total number of tokens. That would be very valuable!
Posted by Todd McDermid on 3/23/2010 at 10:23 AM
Excellent suggestion, although I'd change "tokeniser_string" to "delimiter".

Use cases:
* Deconstructing date strings
* Validating/cleaning address information
* Extracting a "status" property from a string message

I would add that it would be nice to have a complimentary function that would return the total number of tokens that could be found in the string, for those cases where you want to extract the "last" token - or perhaps a TOKENREV analogue.
Posted by Jamie Thomson on 3/23/2010 at 7:40 AM
Please post comments and use cases if you vote this up. if you vote down please say why.
Sign in to post a workaround.
Posted by Cliff Buckley on 3/31/2010 at 12:02 PM
You could use a script transformation to workaround this issue if you happen to know how many parts the data breaks into.