Home Dashboard Directory Help
Search

BULK MERGE by Jamie Thomson


Status: 

Active


140
0
Sign in
to vote
Type: Suggestion
ID: 540038
Opened: 3/8/2010 4:11:57 AM
Access Restriction: Public
0
Workaround(s)
view

Description

We currently have a BULK INSERT operation which is very very useful when inserting data into SQL Server from the SSIS Pipeline.

A common requirement when processing data into a destination from the SSIS pipeline is to work out whether the row already exists in the destinaiton or not. If it does then UPDATE it, otherwise INSERT it. However, this is a very expensive series of operations and it would be a lot better if we could simply push data into a destination component and let the database engine take care of the comparison for us.

In other words, we need a BULK MERGE operation in the database. It would carry out the same basic operation as the existing MERGE statement but it would do it quickly for large datasets in exactly the same way as BULK INSERT does for insertions today.

N.B. I have been conversing with Sunil Agarwal about this so he is aware of it. Please involve him when triaging this.

Oh, and by way of extra explanation, this is intended to provide an alternative option to the traditional:

source-->Lookup-->(via NoMatch output)-->OLE DB Destination
                    |
                    ------->(via Matched output)-->OLE DB Command

pattern that we see so very very often in SSIS and which has a number of potential bottlenecks (not least high memory usage by the Lookup cache).

BULK MERGE would also be very useful for bcp I'm sure.


=========================================================
I have added a matching connect item (https://connect.microsoft.com/SQLServer/feedback/details/542924/ssis-bulk-merge-destination) requesting a destination component in SSIS that would leverage BULK MERGE feature in the DB engine.
Details
Sign in to post a comment.
Posted by beneuto on 5/20/2013 at 8:16 PM
Of all the shortcomings of SSIS, this is the one that baffles me the most. I've used many ETL tools and this is the only one I know of that has no set based update or merge functionality (not including coding, which seems to be the answer for everything in SSIS). I'd be happy with even a partial implementation of the Merge function e.g. simply inserts or updates the input record depending on existence of primary key, not necessarily having to deal with different inputs depending of whether its an insert or update.

A lot of people want this, but its still not there in SQL 2012.
Posted by TomHamilton on 2/28/2012 at 1:51 PM
Living on the edge of an SSAS environment with multiple large highly dynamic databases, Bulk Merge could well be a life saver - I have a global user population, and an very narrow data update window. I have having to close the access window for several hours each day to update merge activity.

Tom Hamilton
Intel, Folsom
Posted by W. Kevin Hazzard on 2/1/2011 at 12:12 PM
I built a T4 script that generates T-SQL using the MERGE statement for this. It traverses the foreign keys in the target database and builds a load order record set. Then it matches on the primary keys and generates the T-SQL for all of the tables in the target database to merge them from a source database. The script generates in about one minute and the actual merge process takes less than five minutes even with millions of differences between hundreds of tables. If the schema changes, I can simply re-gen the script and rerun it. Why can't SSIS do this in a simple way? I should be able to drop a source and destination in and wire up the metadata between two identical tables to do a one-way (or even a two-way) merge with a few clicks of the mouse. Better yet, if the metadata changes (on both sides obviously), the package should just detect that, validate that the schemas match and update itself to perform the merge operation. Sorry but I just can't stand merge replication. It's a nightmare to set up and manage. I just want a simple metadata-driven SSIS component that will do exactly what the MERGE statement in T-SQL does.
Posted by Microsoft on 6/3/2010 at 11:01 AM
I just created a blog for MERGE statement with TF-610. Here is the link. http://blogs.msdn.com/b/sqlserverstorageengine/archive/2010/06/03/minimal-logging-and-merge-statement.aspx

You may find it useful. I am not suggesting that it takes care of this issue but it is good to know how you can get minimal logging for MERGE statement.

Thanks
Sunil
Posted by BLPond on 5/6/2010 at 12:26 PM
Jamie,

Agreed, this would be a very good improvement for SSIS. We can "get by" with operations like Slowly Changing Dimensions, but that was never meant for bulk operations, and is expensive and slow. A BULK MERGE would be outstanding.

Thanks for the suggestion, and for all your SSIS articles and know-how.

Brian Pond
PondBL@hotmail.com
Posted by Jamie Thomson on 3/23/2010 at 7:42 AM
Please post comments and use cases if you vote this up. if you vote down please say why.
Posted by Microsoft on 3/11/2010 at 5:42 PM
Jamie,

Thanks for submitting this. As we had discussed, this is indeed a useful functionality. We will consider this suggestion in our triage of customer requests.

Thanks
Sunil
Sign in to post a workaround.