Home Dashboard Directory Help

Arabic Unicode ASCII(201) & ASCII(202) by A.Ghazal


 as By Design Help for as By Design

Sign in
to vote
Type: Bug
ID: 542810
Opened: 3/17/2010 11:24:10 PM
Access Restriction: Public
User(s) can reproduce this bug


There's a problem in replacing characters that contains (ة) character and (ت) - The SQL Server doesn't differentiate between those two characters! is spite of each character has different ascii code 201, 202 respectively.

For example:
DECLARE @strWord AS nvarchar(50)
SET @strWord = 'تفعيلة'
SET @strWord = Replace(@strWord, CHAR(201), 'ه')
SELECT @strWord

The result:
Notice that the function 'Replace' did actually replaced both different characters whith different ascii codes!

Sign in to post a comment.
Posted by Microsoft on 3/24/2010 at 11:22 AM
Closing the request. Steve explained the issue clearly.

Umachandar, SQL Programmability Team
Posted by A.Ghazal on 3/19/2010 at 7:38 AM
Thanks for the reply.
I changed the database collation to Arabic_100_CI_AI, and the REPLACE function differentiated between the teh and teh marbutah, in which solved my problem. I don't know if that affects anything when manipulating English data, but so far it's working perfectly.
Posted by Steve Kass on 3/18/2010 at 1:32 PM
I think this is the correct behavior. When you collate ("alphabetize") words in Arabic, the characters teh and teh marbuta are considered the same. The REPLACE function will obey the collation, so it will replace anything the collation treats as the same.

Here's the identical situation in English:

DECLARE @strWord AS nvarchar(50)
SET @strWord = 'Arabia'
SET @strWord = Replace(@strWord, 'A', '*')
SELECT @strWord

If the collation in use is case-insensitive, it will consider 'a' and 'A' to be equal. Even though these characters have different ASCII codes, they will both be replaced. If the collation in use is case-sensitive, on the other hand (or binary), 'a' will not be considered equal to 'A', and only 'A' will be replaced.

If it's important for you to distinguish between teh and teh marbuta, you must use a collation that doesn't follow the linguistic rule that these letters are equal for sorting purposes.

Here's an explanation of the difference between "equal as strings" and "equal as data" regarding these exact two letters in .NET: http://blogs.msdn.com/michkap/archive/2008/05/21/8520660.aspx

Sign in to post a workaround.