Do not provide a default server collation at setup - by Erland Sommarskog

Status : 

 


24
0
Sign in
to vote
ID 357973 Comments
Status Active Workarounds
Type Suggestion Repros 0
Opened 7/26/2008 2:42:24 PM
Access Restriction Public

Description

When installing SQL Server you need to select a server collation. However, Setup permits you to go with a default. This may seem convenient, but it often backlashes, because the default is unsuitable for a number of reasons.

A real-world case: I was contacted by a guy who was fearing for his job, if he could not get some speed out of his database. The query he show me was a simple lookup the PK. The problem was that the application (using Java in Websphere) worked with Unicode, but the table had varchar keys. And he is located in the US, he was using SQL_Latin1_General_CP1_CI_AS, which is the default for English (United States). Had he been located elsewhere, he would probably have been using a Windows collation, and he would never had any performance issue. (Because for Windows collation, SQL Server can still seek the index when varchar meets nvarchar. Not so for an SQL collation.)

I have heard from people in other English-speaking countries like Australia and the UK,
that they have a mess of Latin1_General_CI_AS and SQL_Latin1_General_CP1_CI_AS, probably because sometimes the Windows admin change the system locale, and sometimes not. Then the person who install SQL Server does not know, and since there is a default, 
he does not have to know other; he can go with the flow. The mess comes later.

And this is not going to be any better with the new _100_ collations which are the default
with some system locales.
Sign in to post a comment.
Posted by Microsoft on 3/17/2011 at 10:20 AM
Hi Erland,

I just tweaked the info on our end to reflect that this was "fixed" rather than "won't fix". Or at least we think we've addressed the issue with the contained database work. So while we will still have a default server collation at setup, anyone using a CDB in SQL-11 should be shielded from this collation.

Cheers,
-Isaac
Posted by TiborK on 8/9/2008 at 12:20 AM
I see this collation problem issue all the time, and here's (possibly) another angle at the problem. I've encountered this many time,s and since I've taught SQL Server classes since 1992, I tend to ask students and about 50% of my students have also encountered this problem:

They have an app, developed either inhouse or in many cases by 3:rd party. The install SQL Server and then install the app. The app seems to work, but somewhere int he app, the error message "collation conflict" is shown. The windows admin ("dba") who did the install doesn't understand what happened and in many cases those who developed the software doesn't understand either. So, what happened?

Some collation what picked during install, probably the default collation. The app then created a database with some other collation. The app creates a temptable and then joins this to a table inside the database. Bang - there's your error.

Ideally, the onw who install SQL Server need to stop and think about what collation to install. They should read the app's installation notes and if that doesn't say anything, they should stop the install and contact the app vendor. This is the only way to handle this situation. Sure, the app vendor *might* manage different collations across system and app db, but how would the install person know that? By consulting the app vendor. Hiding collation choice only makes the situation worse.
Posted by Microsoft on 8/8/2008 at 5:10 PM
Thank you for your explanation, it really helped me understand the issue better. As you pointed out the problem is only going to get worse with new collations bing introduced in 2008.

Thanks,
Krzysztof Kozielczyk
Posted by Erland Sommarskog on 8/6/2008 at 1:17 PM
No, the collation of the master database is not critical as far as I know. However, the collation of tempdb may be, at least as long as temp table gets their collation from the tempdb collation and not the current database.

I think there are two main issues here.
1) Users get a unicollation server with an SQL collation, use varchar in their tables, but have clients that works with Unicode only, and they get table scans because of implicit conversions. This mainly happens to US users.
2) Users have some servers with SQL_Latin1_General_CP1_CI_AS and some with Latin1_General_CI_AS. Then they move databases between these servers and get a mess. This mainly happens to users in countries like the UK or Australia.

I should add that these are not so much issues I ran into myself, as I see and hear from other people who do.
Posted by Microsoft on 8/6/2008 at 12:29 PM
Sorry for the confusion.

I wasn't trying to propose any particular solution, but understand the problem better. From your description I assume that the problem is that after the server is installed with current default server collation, users experience problems with databases that inherit the server collation. Your proposed solution is to have a better default collation (or have no default collation and help users pick the right one, based on user and system locale). I think we are on the same page here.

What I wanted to understand, is if the problem is only connected with the collation of new databases, or if it's wider. Does collation of master DB affect your scenario?

I agree that we can't collate new databases by default different than in the server collation. That was just an example to illustrate my question. One more time sorry for the confusion.

Thanks,
Krzysztof Kozielczyk
Posted by Erland Sommarskog on 8/6/2008 at 12:04 PM
I'm afraid that I don't really understand your comment. While it's great to have the ability to have multiple collations in the same server instance, for very many users this of little interest. They are best served by having the same collation in the entire server, or else they will have to fight battles with collation conflict.

If I understand you correctly, you suggest that rather than inhering the server collation as the default, the default for the database collation would be derived from the system locale. I am afraid that this would only aggrevate the problem. Say that someone installs a server in Warszawa and leaves the system locale as US English. Someone installs SQL Server and selects Polish_CI_AS as the server collation. But new databases would now get SQL_Latin1_General_CP1_CI_AS as their default collation and I think you know that that is not a very good collation for Polish.

No, the default database collation should also in the future be the server collation. (Actually, I considered to submit a suggestion that a server would be default be configured as uni-collation, to stop collation conflicts early, an idea I eventually
discarded.)

The problem is that it is very difficult for a setup program to suggest a default
collation, not the least when you have dragged yourself into a mess with having
the odd choice of an SQL collation as the default for US English. Furthermore,
many users do not have full understanding of collations, and may not realise
the problems they may face if they make an incorrect chocie at installation time.

Currently, the most victims for this are found in the English-speaking world, due
to the mixup of Latin1_General_CI_AS and SQL_Latin1_General_CP1_CI_AS.
But since you are now offering some system locales the old 80 collations, and
others the new 100 collations, this problem may spread.

I think the best way out is to not provide a default, but give uses a mini-wizard
that helps to make a good choice.

And if that is too much work, settle on consistently using Windows 80 collations for
all system locales. And that includes US English.
Posted by Microsoft on 8/6/2008 at 10:00 AM
Hello,

Thank you for sharing your idea with us. As I understand from your description the problem is database collation rather than SQL Server instance collation. E.g. would having new databases collate to system local by default solve your problem as well?

Thanks,
Krzysztof Kozielczyk