Arabic Unicode ASCII(201) & ASCII(202) - by A.Ghazal

Status : 

  By Design<br /><br />
		The product team believes this item works according to its intended design.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 542810 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 3/17/2010 11:24:10 PM
Access Restriction Public


There's a problem in replacing characters that contains (ة) character and (ت) - The SQL Server doesn't differentiate between those two characters! is spite of each character has different ascii code 201, 202 respectively.

For example: 
DECLARE @strWord AS nvarchar(50)
SET @strWord = 'تفعيلة'
SET @strWord = Replace(@strWord, CHAR(201), 'ه')
SELECT @strWord

The result:
Notice that the function 'Replace' did actually replaced both different characters whith different ascii codes!

Sign in to post a comment.
Posted by Microsoft on 3/24/2010 at 11:22 AM
Closing the request. Steve explained the issue clearly.

Umachandar, SQL Programmability Team
Posted by A.Ghazal on 3/19/2010 at 7:38 AM
Thanks for the reply.
I changed the database collation to Arabic_100_CI_AI, and the REPLACE function differentiated between the teh and teh marbutah, in which solved my problem. I don't know if that affects anything when manipulating English data, but so far it's working perfectly.
Posted by Steve Kass on 3/18/2010 at 1:32 PM
I think this is the correct behavior. When you collate ("alphabetize") words in Arabic, the characters teh and teh marbuta are considered the same. The REPLACE function will obey the collation, so it will replace anything the collation treats as the same.

Here's the identical situation in English:

DECLARE @strWord AS nvarchar(50)
SET @strWord = 'Arabia'
SET @strWord = Replace(@strWord, 'A', '*')
SELECT @strWord

If the collation in use is case-insensitive, it will consider 'a' and 'A' to be equal. Even though these characters have different ASCII codes, they will both be replaced. If the collation in use is case-sensitive, on the other hand (or binary), 'a' will not be considered equal to 'A', and only 'A' will be replaced.

If it's important for you to distinguish between teh and teh marbuta, you must use a collation that doesn't follow the linguistic rule that these letters are equal for sorting purposes.

Here's an explanation of the difference between "equal as strings" and "equal as data" regarding these exact two letters in .NET: