SSIS Merge join producing incorrect results when joining on varchar fields and sorting done in OLEDB source object - by DejanNK

Status : 

  By Design<br /><br />
		The product team believes this item works according to its intended design.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


8
0
Sign in
to vote
ID 315188 Comments
Status Closed Workarounds
Type Bug Repros 2
Opened 12/4/2007 5:10:22 PM
Access Restriction Public

Description

I ran into a very strange problem with the Merge Join. 

I used two OLEDB Source objects to retrieve data from two databases. I used OLEDB source advanced properties to sort the output and ignore case (varchar data type).

I then added a Merge join object and used sorted output from OLEDB objects to perform matching. On 2 out of three servers this worked fine, but on one server 2 entries failed to match althought they were identical. 

Replacing sorting in OLEDB source with Sort transformation fixes the problem. 
Sign in to post a comment.
Posted by milksama on 12/15/2016 at 9:53 AM
I misspoke -- it is when I use the "Full Outer Join" command that I encounter the scenario I described. Using an "Inner Join", there are no results.
Posted by milksama on 12/15/2016 at 9:46 AM
I would not consider this "by design." I encountered the same problem using two sources and a "Merge Join" component employing an "Inner Join" command. I piped ALL the output to an excel spreadsheet and nothing is joined. Columns A, B and C have data for the first N records while columns D and E are empty. For records N+1 to M, columns A, B and C are empty while columns D and E have data. When performing a copy and ctrl+F from a value in column A, a match is found in column D.

It's not the end of the world -- there are workarounds, but it is disappointing this feature is not working as advertised, nor has anyone considered just looking into it. I would be happy to help if more info is needed.

Thanks,

Mike
Posted by Todd C on 11/5/2015 at 12:28 PM
SSIS Design Team need to FIX this BUG (I don't CARE if it's "BY DESIGN", change the DESSIGN, then FIX the BUG). OR an option would be to put in some advanced features of the MERGE JOIN to specify the COLATION to be used.

BUT, I can almost guarantee that this A) won't get fixed, and that B) none of YOU will get notified of this post.
Posted by Cleyton Alencar on 5/15/2013 at 1:29 PM
SSIS, just like Excel uses windows collation, SQL Server can use SQL Server or Windows Collation. One simple way to fech the data from SQL Server with windows collation is to convert the collation being used to the matching Windows one. In my case, the SQL server is SQL_Latin1_General_CP1_CI_AS, so I needed to convert it to Latin1_General_CI_AS.

SQL Server Collation : SQL_Latin1_General_CP1_CI_AS
Windows Collation : Latin1_General_CI_AS

UPPER(RTRIM(LTRIM(<SourceColumn>))) collate Latin1_General_CI_AS AS <SourceColumnAlias>
Posted by jalh on 6/25/2012 at 7:38 AM
This "bug" hasn't been solved yet? We have experienced the same behaviour a few days ago and although the sort task worked out the problem, it is not the best solution because of the performance (when you have to sort millions of rows). Does somebody know if it has been solved in another way?
Posted by _msdn_ on 11/19/2010 at 4:56 AM
I have been experiencing an issue related to this where a merge join was missing equivalent values. One data source is an Access table through OLEDB of type double precision floating point which was cast into WSTR(10), sorted and joined onto a sql server table through ADO on a varchar/WSTR(10) column. After trying a few things (including the LTRIM() and RTRIM() workaround here) to no avail, I cast the string value into a float and the join worked. There must be an issue here with metadata or encoding that is not obviously discernible through the interface.
Posted by Stuart A Steedman on 1/19/2010 at 4:16 AM
RTrim may fix the problem, but the root cause of this may be inappropriate use of ANSI_PADDING ON/OFF. If you generate the CREATE script for each table, you may be able to see the difference - the presence/lack of the statement SET ANSI_PADDING ON/OFF in one of the scripts should indicate where the problem lies.

This should only be a problem with CHAR datatypes, not VARCHAR. At this point I am wondering why my VARCHAR field is padded with spaces even though the create script shows that ANSI_PADDING is OFF...

Stu

From:
SQL Server 2005 Books Online
SET ANSI_PADDING (Transact-SQL)

"We recommend that ANSI_PADDING always be set to ON."
Posted by gxa on 8/6/2009 at 6:47 AM
I wanted to add to the discussion.

I had a similar issue that not even sort would not correct... The data was from two different sources and no matter what kind of sources they were (oledb or ado) the merge would return no data.

I managed to fix this by inserting a data viewer for both of the sord inputs and copying the data out to a text editor like textpad that shows the hidden characters (spaces, tabs etc). I determined that there were trailing spaces that forced the merge to fail... RTrim in the sources of the merge inputs solved the problem.

So another thing to check is the data and the metadata of the inputs for the merge.. the error is boutnd to be there somewhere.

Regards
George
Posted by a2q2b on 1/21/2008 at 3:01 PM
I had a very similar issue. My output property on the data source was set to sort = true and I carefully set my sort positions on each of the key columns on both data sources correctly. The queries within the data sources were ordering properly. Even though the data was exactly the same, merge join did not make a match. Unfortunately, although adding the explicit Sort transformation before the merge join solved my problem, this option is much slower. I would like to know why the merge join did not make a correct join. I am working with a SQL Server 2000 data source and a SQL Server 2005 64bit data source. I am joining on three keys: 2 two-byte signed integers and 1 string, code 1252, length 15.
Posted by Microsoft on 12/20/2007 at 5:37 PM
dejann,
You mentioned you use OleDB source to sort the outputs and this sounds a little problematic to me. If you are referring to setting the field of "IsSorted" on the advanced property page to true, I am afraid this does not perform the sort action but instead only indicates the output is sorted. If you are sure the data oledb source fetches is sorted then you can set that field to true otherwise you will aditionally use a sort transform to actually sort the data before you do the merge join.
Let me know if this does not solve your problem.

Thanks,
Jia