Home Dashboard Directory Help
Search

Arabic Unicode ASCII(201) & ASCII(202) by A.Ghazal


Status: 

Closed
 as By Design Help for as By Design


1
0
Sign in
to vote
Type: Bug
ID: 542810
Opened: 3/17/2010 11:24:10 PM
Access Restriction: Public
0
Workaround(s)
view
0
User(s) can reproduce this bug

Description

There's a problem in replacing characters that contains (ة) character and (ت) - The SQL Server doesn't differentiate between those two characters! is spite of each character has different ascii code 201, 202 respectively.

For example:
DECLARE @strWord AS nvarchar(50)
SET @strWord = 'تفعيلة'
SET @strWord = Replace(@strWord, CHAR(201), 'ه')
SELECT @strWord

The result:
----------
هفعيله                                                                                                                                                
Notice that the function 'Replace' did actually replaced both different characters whith different ascii codes!

Thanks
Details
Sign in to post a comment.
Posted by Microsoft on 3/24/2010 at 11:22 AM
Closing the request. Steve explained the issue clearly.

--
Umachandar, SQL Programmability Team
Posted by A.Ghazal on 3/19/2010 at 7:38 AM
Thanks for the reply.
I changed the database collation to Arabic_100_CI_AI, and the REPLACE function differentiated between the teh and teh marbutah, in which solved my problem. I don't know if that affects anything when manipulating English data, but so far it's working perfectly.
Posted by Steve Kass on 3/18/2010 at 1:32 PM
I think this is the correct behavior. When you collate ("alphabetize") words in Arabic, the characters teh and teh marbuta are considered the same. The REPLACE function will obey the collation, so it will replace anything the collation treats as the same.

Here's the identical situation in English:

DECLARE @strWord AS nvarchar(50)
SET @strWord = 'Arabia'
SET @strWord = Replace(@strWord, 'A', '*')
SELECT @strWord

If the collation in use is case-insensitive, it will consider 'a' and 'A' to be equal. Even though these characters have different ASCII codes, they will both be replaced. If the collation in use is case-sensitive, on the other hand (or binary), 'a' will not be considered equal to 'A', and only 'A' will be replaced.

If it's important for you to distinguish between teh and teh marbuta, you must use a collation that doesn't follow the linguistic rule that these letters are equal for sorting purposes.

Here's an explanation of the difference between "equal as strings" and "equal as data" regarding these exact two letters in .NET: http://blogs.msdn.com/michkap/archive/2008/05/21/8520660.aspx


Sign in to post a workaround.