Home Dashboard Directory Help

'ß' and 'ss' are NOT equal by BetterToday


Status: 

Active


16
0
Sign in
to vote
Type: Bug
ID: 341130
Opened: 4/30/2008 1:58:17 AM
Access Restriction: Public
1
Workaround(s)
view
7
User(s) can reproduce this bug

Description

Hi,

when setting SQL Server to standard latin collation (Latin1_General_CI_AS), a NVARCHAR(x) UNIQUE column raises a Unique Constraint error, when adding these two rows to the table:

Maße
Masse

This is not correct. Maße and Masse are two different words having different meanings. Especially after German spelling reform, 'ß' and 'ss' got even more separated. (Whereas 'SS' is currently still are replacement for 'ß' when capitalizing a word to upper-case, since 'ß' is only a lower-case letter.)

After discussing this online (http://groups.google.com/group/microsoft.public.de.sqlserver/browse_thread/thread/c2f2b1dd36ab439b)
I've been discussing this issue with DIN. They are currently discussing this issue internally to update the DIN/ISO standards.

Nonetheless, SQL2005/2008 should provide an option to *not* regards 'ß' and 'ss' as being equal while still adhering to all other dictionary collating rules.

Thanks,
Axel Dahmen
Details
Sign in to post a comment.
Posted by Mark Guinness on 5/20/2013 at 11:49 AM
This still appears to be an issue in SQL Server 2012. FWIW, the same problem occurs with æ and ae. Has anyone from the SQL team consulted with the Windows team to see if they are adopting the new DIN/ISO standards?
Posted by Microsoft on 6/21/2011 at 3:02 PM
Hello,
Thanks for your feedback. Our current behavior follows the SQL/ISO standard and unless those standards are updated with the latest changes we don't intend to change the behavior in SQL Server. Changing existing SQL Server behavior has lot of implications and today we rely on Windows for all of our windows collation sorting capabilities. If and when in the future Windows adopts these new rules / Unicode standard we will incorporate it in SQL Server. Till then, you have to use other workarounds suggested in the comments. Hope this helps.

--
Umachandar, SQL Programmability Team
Posted by YingXiao on 7/14/2010 at 6:58 PM
Hi there,

We have the same issue as Axel. We don't want to add an extra column to this table, or change collation of this column, as it will have negative effect to the existing system.

Do you already have a fix for this issue? If not, when are you going to fix this?

Thanks,
Ying
Posted by BetterToday on 8/4/2008 at 3:45 AM
Hi Jim,

sorry for taking me so long to reply..

The corresponding DIN group is discussing this issue now. I can provide you with the personal contact at DIN if you provide me with your e-mail address.

I've played my part in this game so far by getting DIN (ISO) aware of the problem and having a discussion started to update the standards. This is where I'm off the train now.

Do you want me to provide you with the contact at DIN?

Best regards,
www.axeldahmen.com
Axel Dahmen
Posted by Microsoft on 5/1/2008 at 8:13 AM
Hi Axel,

Thankyou for this bug report. I've passed it over to the Developer Team to investigate; in particular, how much work it would take us to fix and/or provide an option to differentiate.

Please let us know how/when DIN decides on this issue, so we can adhere with their standard.

However, it's unlikely this change will make it into the current release. The 'bar' for checking-in fixes is now very high, as we enter the 'end-game' for Katmai.

Thanks,

Jim
Sign in to post a workaround.
Posted by Steve Kass on 4/30/2008 at 9:57 AM
Below is a way you might work around this. A second column with BIN collation is added to the primary key that contains only the sequence of appearances of 'ss' and 'ß' in the first column.

Notes: The function below turns triple-s into 'ss', but the workaround should correctly distinguish ss and ß where it needs to. You would want to analyze and test this carefully if you decide to use it. Also, depending on how data enters your table, you would either change the input routines or create a view without the extra column and use INSTEAD OF INSERT/UPDATE triggers to maintain the view. I didn't try a persisted computed column, and I don't know if all the determinism rules are satisfied.

Another possibility is to use a collation (such as Albanian or Latin1_General_BIN) that distinguishes ess from eszet, but I assume that's not an option, because it changes other collation rules also (the legacy SQL collations might come close, though).

Unfortunately, the eszet is a murky corner of written language - Unicode has or will have an UPPER-case sharp-s also, just to create more confusion.

create function ss_eszet(
@s nvarchar(max)
) returns nvarchar(max) as begin
if @s is null return null;
declare @result nvarchar(max);
set @result = N'';
declare @i int;
set @i = 1;
declare @p int;
while @i < len(@s) begin
    set @p = charindex('ss' collate latin1_general_ci_as,@s collate latin1_general_ci_as,@i);
    if @p = 0 break;
    set @result = @result + substring(@s,@p,1);
    set @i = @p + 1;
end
return @result
end;
go

create table T(
s varchar(100) collate latin1_general_ci_as not null,
ss_eszet varchar(100) collate latin1_general_bin not null,
primary key(s,ss_eszet)
)
go

insert into T values ('Maße',dbo.ss_eszet(N'Maße'));
insert into T values ('Masse',dbo.ss_eszet(N'Masse'));
go
insert into T values ('Maße',dbo.ss_eszet(N'Maße'));
go

insert into T values ('Masse',dbo.ss_eszet(N'Masse'));
go

select s from T;
go

drop function ss_eszet;
drop table T;