Search

Replace of char(0) does not work in DB with Windows collation by Erland Sommarskog

Active

17
0
Sign in
to vote
Type: Bug
ID: 125502
Opened: 2/7/2006 2:58:18 PM
Access Restriction: Public
1
Workaround(s)
5
User(s) can reproduce this bug
When trying a replace a NUL character with replace(), this works is the value has an SQL collation, but not a Windows collation.
Details (expand)
Product Language
English
Version
SQL Server 2005 - Developer Edition (32)
Category
SQL Engine
Operating System
Windows XP SP2 Professional
Operating System Language
English
Steps to Reproduce
CREATE TABLE nulchar (a varchar(12) COLLATE Finnish_Swedish_CS_AS,
                     b varchar(12) COLLATE SQL_Latin1_General_CP1_CI_AS)
go
INSERT nulchar(a, b)
VALUES ('ABC' + char(0) + 'DEF', 'ABC' + char(0) + 'DEF')
go
UPDATE nulchar
SET    a = replace(a, char(0), '---'),
     b = replace(b, char(0), '---')
go
SELECT a FROM nulchar
SELECT b FROM nulchar
go
DROP TABLE nulchar
Actual Results
(1 row(s) affected)

(1 row(s) affected)
a
------------
ABC DEF

(1 row(s) affected)

b
------------
ABC---DEF

(1 row(s) affected)
Expected Results
(1 row(s) affected)

(1 row(s) affected)
a
------------
ABC---DEF

(1 row(s) affected)

b
------------
ABC---DEF

(1 row(s) affected)
Platform
 
File Attachments
0 attachments
Sign in to post a comment.
Posted by Microsoft on 11/28/2011 at 8:16 AM
Added the following.
0x0000 (char(0)) is an undefined character in Windows collations and cannot be included in <function_name>.
To topics:
REPLACE (Transact-SQL), CHARINDEX (Transact-SQL), PATINDEX (Transact-SQL), and LIKE (Transact-SQL).
Thank you for reporting this.
Posted by Tom Thomson on 8/1/2011 at 12:57 PM
The MS comment (number 3) is not very useful in face of the fact that replacing NULL with a zero length string works for a varchar(max) with a windows collation, but trying searches in the resulting string causes hangs.
Posted by SwePeso on 8/18/2009 at 12:24 AM
There are other ways to confirm this bug



declare @s varchar(100)

set @s = 'xxxx                                 yyyyy         zzzzzz'

-- Works
SELECT REPLACE(REPLACE(REPLACE(@s COLLATE SQL_Latin1_General_CP1_CS_AS , ' ', ' ' + CHAR(0)), CHAR(0) + ' ', ''),
CHAR(0), '')

-- Works
select replace(cast('x' COLLATE SQL_Latin1_General_CP1_CI_AS as varchar(max)), char(0), '');

-- Hangs SSMS (eternal loop)
SELECT REPLACE(REPLACE(REPLACE(@s COLLATE Latin1_General_CI_AS , ' ', ' ' + CHAR(0)), CHAR(0) + ' ', ''),
CHAR(0), '')

-- Hangs SSMS (eternal loop)
select replace(cast('x' COLLATE Latin1_General_CI_AS as varchar(max)), char(0), '');
Posted by Erland Sommarskog on 2/11/2006 at 3:46 PM
So where is this documented?

This bug reported was spawned by a question on a newsgroup where some one was trying to do this. That is, this is not only an academical question.

It may be difficult to document for all places where this applies, but it should be documented for the functions where the behaviour is entirely unintuive, that is replace, charindex, patindex and LIKE.
Posted by Microsoft on 2/8/2006 at 4:28 PM
This is due to the fact that 0x0000 is an undefined character in Windows collations. All undefined characters are ignored during comparison, sort, and pattern matching. So searing for 'a' + char(0) is really searching for ‘a’, and searching for char(0) is equivalent to empty string.

The way to handle undefined character is a bit confusing, but this is the way that Windows defined to sort them, and SQL Server conforms with the general Windows API.

In SQL collation, there is no notion of undefined character. Each code point is assigned a weight, that's why we don't see a problem there.

Regards,
Jun
Sign in to post a workaround.
Posted by AndyDoran on 3/12/2013 at 2:33 AM
FYI - I saw this in a UK Locale SQL Instance (Latin1_General_CI_AS) but not US Locale (SQL_Latin1_General_CI_AS) on the same server. I also found that it seems to have been fixed in SQL 2012 as I tried it there on a UK Locale instance, and everything was fine.

I also found that the easiest way around the problem was to use NVARCHAR(MAX) instead of VARCHAR(MAX). ie this fails:-

declare @test varchar(max)
set @test = 'this is a test'
set @test = replace(@test, char(0), char(32))

but this works:-

declare @test nvarchar(max)
set @test = 'this is a test'
set @test = replace(@test, char(0), char(32))

Setting the collation during the replace also works, but it was easier for me just to change varchar to nvarchar