Search

Change UNIQUE constraint to allow multiple NULL values by Hugo Kornelis

Active

214
20
Sign in
to vote
Type: Suggestion
ID: 299229
Opened: 9/22/2007 1:31:50 PM
Access Restriction: Public
Duplicates: 498009
4
Workaround(s)
According to the ANSI standards SQL:92, SQL:1999, and SQL:2003, a UNIQUE constraint should disallow duplicate non-NULL values, but allow multiple NULL values. SQL Server has always implemented a crippled version of this, allowing a single NULL but disallowing multiple NULL values.

Please start moving from the current non-standard implementation toward ANSI-standard behaviour for UNIQUE constraints.
Details (expand)
Product Language
English

Category

SQL Engine

Proposed Solution

I am fully aware of the danger of breaking working code. This is a change that can't be made at once (unfortunately). A good upgrade path would be as follows:

1. (Preferably in SQL Server 2011): Introduce a new option for UNIQUE constraint definition: ALTER TABLE xxx ADD CONSTRAINT yyy UNIQUE (Col1, Col2) WITH ANSI_NULLS = [ON | OFF]. Make ANSI_NULLS = OFF (currect implementation of UNIQUE constraints) the default, so that existing code will continue to run unmodified.

2. One version later (SQL Server 2014?): Change the default for new UNIQUE constraint to ANSI_NULLS = ON (the ANSI specified behaviour where multiple NULLs are allowed).

3. Yet another version later (2017??): Mark the WITH ANSI_NULLS = [ON | OFF] option as deprecated.

4. (2020????): Remove the option.

*** Note that I edited the original suggested timeframe, since starting step 1 in SQL Server 2008 is no longer possible ***

Benefits

Faster Development
Improved Reliability
Other (please provides details below)

Other Benefits

Improved portability, adherence to industry standards
File Attachments
0 attachments
Sign in to post a comment.
Posted by colinbashbash on 6/18/2013 at 2:36 PM
According to the spec, it's not valid to have the "Many Nulls - some null" that I questioned below. Sorry for the confusion, there.
Posted by colinbashbash on 6/18/2013 at 2:28 PM
I've found times where both ways are useful. It may be useful to provide a One Null/Many Nulls sub-type to a unique constraint. I think it's important to look at 1) the effects on unique keys over single vs multiple columns, 2) how this would affect upgrading databases, and 3) how this would affect applications that need to be redistributed to different database vendors dependent on a client's preference.

1) Single & Unique keys
Unique key over a single column:
Many Nulls - useful: You may want to say that people have to have a unique email, but leave their email address null until their account is approved.
One Null - (maybe useful): You may want to specify that there's a distinct external reference identifier for a column, and that only one of the table rows can be linked to an external identifier. (I would think that as NULL means nothing, empty string would be a better value. I'm sure there are other similar uses)

Unique key over multiple columns:
==Assume you columns (a,b) that are in your unique key
--One Null - useful - If Column A is not nullable, then allowing column B to have one null value for each value of A can be very useful. (Table Access (PK_AccessId, ParentId, GroupId not null, SubGroupId null)). Another example would be putting a unique constraint over UserName & DeleteDate, so that you can only have 1 undeleted user and multiple deleted users with the same UserName,
--Many Nulls - all null - I can see where you may want to allow multiple rows where all A and B are null.
--Many Nulls - some null - It's harder to see it being useful to have multiple null values of B for a distinct A. If you were going to implement Many Nulls, I would hope that it would only allow multiple nulls if everything is null and expect users to use filtered indexes to accomplish something different. (What does the spec say on this?)

2) Compatibility
-- One option to preserve compatibility for existing Sql Server databases would be to have the conversion tool set each unique key to One Null by default. It could set it up so that it throws an error if you don't specify One or Many Nulls when creating a unique constraint on the newer database.

3) Redistributed databases
-- This is already complicated enough, since stored procs aren't part of the standards. It would be preferable if SQL Server would follow the Specs for Unique constraints, or throw an error to alert the developer if a small tweak is needed, rather than just apply a non-standard way of doing things. If needed, it could probably be updated eventually to default to the standard behavior, or a future standard could be updated to allow for the additional specification of one/many nulls.

(I'm not voting Yea or Nay on this, because I think something should be done, but the idea of breaking existing designs could increase costs for existing sql server customers)
Posted by The Jams on 12/27/2012 at 10:23 AM
Better compliance of all current standards would be to not allow the insert of any duplicate values to waste table space when a UNIQUE CONSTRAINT (not UNIQUE INDEX) is in place. NULL or NOT NULL duplicates are always wastefull and can create several bugs when not expected in any data structure.
If there is a UNIQUE constraint on a TABLE and 100 rows of NULL values can be inserted what standard is that compliant for?
Voting no to this backwards and contrary request. Can't understand why MS has not killed this connect after 6 years and updates to standards that make it obsolete.
Posted by Hugo Kornelis on 12/21/2012 at 1:49 AM
After reading the comment posted by Tom, I feel compelled to point out that my main reason for proposing this change is better ANSI-compliance. Some people (far less than the standards committee would like, but more than zero) have to build code that supports multiple platforms. More people change jobs, or stay at a job when their employer switches to another database platform.

They *will* get into trouble over non-standard extensions not working on the new platform, but that's okay-ish. I see that vendors need to distinguish themselves, and that they often can't wait for the standardisation committtee to catch up.
They *will* also get into trouble over standard features not implemented on the new platform. Less okay-ish, but still acceptable - I get that implementing more features is a trade-off.
I am okay-ish with all of those because, whenever a feature is not implemented on the new platform, they simply get an error message. A very usesful way to tell them to look for a different implementation.

What I have a huge problem with, is taking a feature from the standard, and then implementing it just a tiny bit different. People coming from Oracle (e.g.) and knowing that UNIQUE is an ANSI feature will just use it in SQL Server, and when they don't get a syntax error, they'll expect it to work correctly. But it doesn't - at least not according to their (and the ANSI) idea of "correct". That introduces bugs that hopefully are discovered during development and QA and cause "only" a few days of lost work - but in theory could make it all the way into production and cause huge errors later.

Suggestion to the Microsoft team - if lots of people think the non-standard implementation of UNIQUE is valuable and they want to keep it (even though there are workarounds for it), then just implement only the first two steps of my suggestion. I have no problem with having an optional keyword attached to UNIQUE constraints to keep the current behaviour - as long as it defaults to ANSI-standard behavious (eventually - again, I do not propose to change this at once, I propose to do it gradually and give people time to change their code).
Posted by Tom.Thomson on 12/20/2012 at 8:46 AM
Since it's trivial to achieve the type of constraint suggested by this entry without any changes at all to SQL server (a perfectly good method using filtered indices was pointed out 63 months ago, and another - perhaps less satisfactory- workaround was suggested a few monthe later), I can only see this as an attempt to throw away a useful feature in order to conform with a standard. Unfortunately, the standard is pretty poor - it has adopted an untenable approach to NULL in UNIQUE constraints - banning NULLs altogether for columns involved in UNIQUE constraints is another untenable approach (one that I've suggested pushing for in standards discussions so that maybe a compromise between than and the currently diametrically opposed untenable position in the standard will lead to something tenable).

Clearly a lot of people don't understand what is actually implemented in SQL Server for Unique Constraints (perhaps they've been reading and believein Books OnLine, which could certainly encourage such misunderstanding about this feature). What SQL Server provides is a UNIQUE constraint that guarantees that there is no possible way to assign values to the NULLs in a set of rows that satisfies the constraint that would deliver a set of rows that doesn't satisfy it, even if the constraint is completely disregarded for the purpose of assigning values to those NULLs. That is enormously valuable (as Erland Sommarskog pointed out 5 years ago). Let's not throw it away in return for no new capability at all.
Posted by beneuto on 9/20/2012 at 6:22 PM
NULL fields mean you dont always have a value so putting a unique index on a nullable field is saying, "We dont always have a value for that field but when we do it must be unique". This is much more likely than saying, "There will only ever be one record that we dont have a value for" which is what the current behaviour supports. Therefore I doubt that many applications would be impacted by changing to the ANSI standard in this case.
Posted by serbat on 7/26/2012 at 8:45 AM
Me too, that´s why a standard is defined.
Posted by Kanalpiroge on 9/27/2011 at 7:14 AM
I totally agree! Please implement the standard!!!
Posted by Shanus371 on 8/19/2011 at 12:14 PM
MS should implement the standard. Agreed 100%. For those saying it doesn't make sense, it only seems to show the lack of understanding of what NULL is. It's not string.Empty folks. NULL is nothing, not allocated, NULL does not equal NULL , NULL is not something. As being nothing means it cannot be compared to anything, even nothing, any field with a unique constraint and allow nulls should allow an unlimited number of fields to be NULL. That is proper behavior and it's not even really something up for debate. That is the way it should work.
Posted by Oliver Wilcock on 7/12/2011 at 5:23 PM
MS should implement the standard and quit wasting my time.
Posted by David Portas on 4/11/2011 at 2:21 AM
Added an alternative suggestion here: http://connect.microsoft.com/SQLServer/feedback/details/658638/dont-allow-nullable-columns-as-part-of-unique-constraints
Posted by Tom Thomson on 4/7/2011 at 12:08 PM
I think this is an appalling idea. We should instead be working to get the standards organisations to ghange the standard so that a UNIQUE column (or any column in a group of columns which together are unique) does not permit null, and planning to change SQL server to have the option to forbid NULL in any such column instead of allowing a single null.
Posted by Hugo Kornelis on 12/2/2010 at 2:44 AM
I have edited the proposed solution. Implementing the first step in SQL Server 2008, as originally suggested, is of course no longer possible. But I hope it's still possible to add this in SQL Server 2011.
Posted by Stan Segers on 7/21/2009 at 3:50 PM
Quoting SQL Server 2008 Books Online, SET ANSI_NULLS (Transact-SQL) :

[start quote]
Important:
In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
[end quote]

Does future mean next and does the NULL in a UNIQUE index mean NULL as in ANSI_NULLS?

Anyway, Hugo's proposed solution is one I can greatly accept. It doesn't break old code in this version and it hardens new code for the future... can we get it into the list of fixes for SQL Server 2005 SP4, SQL Server 2008 SP2 and the product release for Kilimanjaro/SQL Server 2008 R2?
Posted by DMulvena on 7/17/2008 at 2:12 AM
I agree that Microsoft SQL should follow the standard. NULL is a concept rather than a fixed value, and allowing only one row to have this, as if it were a fixed value, is illogical. (Equally illogical in my opinion is including all the records with null columns in an index - what is the point of that?) I can't agree with the suggestion though of overloading the existing ANSI_NULLS option.
Posted by Dragba on 6/12/2008 at 9:15 AM
I always had a problem with unique CONSTRAINT that allows multiple NULLs (or NULLs at all). However, I always missed unique INDEX that allows multiple NULLs. I'm pretty happy with MS solution (what took you so long?). Mind you that you can create a foreign key to the unique constraint. Doing that to the alternate key with multiple NULLs (or NULLs at all) makes no relational sense to me.
I understand the importance of standards, even when they are wrong, so MS could do something about it, keeping this behaviour default and recommended.
Posted by FluidTrade Support on 2/23/2008 at 10:00 AM
I need to add my voice to the list of people that think this is a big issue. The problem for me is less the compatibility with other databases than the compatibility with ADO.NET. The Unique constraints created for a DataSet allow multiple nulls but enforce uniqueness with the non-null values. I'm creating a middle tier where the in-memory functions have analogous functions on the persistent (read SQL) store. Currently, I can't create a constraint on SQL Server that works the same as the Unique constraint in my ADO.NET DataSet.
Posted by Hugo Kornelis on 1/29/2008 at 3:10 PM
You closed the suggestion as "fixed" after I posted my comment, but I felt compelled to re-open it. The suggestion was NOT to implement "some way" to constrain non-NULL values to be unique. The suggestion was to (gradually) change the UNIQUE keyword to behave as it is defined in the ANSI / ISO standards SQL-92, SQL:1999, SQL:2003, and probably some others as well.
Posted by Hugo Kornelis on 1/29/2008 at 10:49 AM
Erland: If you have indeed encountered situations where a UNIQUE constraint that permits a single NULL but not more is useful, then I'd prefer it if the non-standard constraint was given a proprietary name (eg UNIQUE_ONE_NULL_ONLY), and the name as defined in the standard (UNIQUE) behaves as described in the standard. That is, after all, what standardisation is all about!
Posted by Hugo Kornelis on 1/29/2008 at 10:47 AM
As glad as I am with the filtered index feature, I don't consider this a solution to my suggestion.

What I asked is to make UNIQUE behave as defined in the standard. The filtered index option is now presented as an argument for not having to come up to standards.

Every developer who comes from an ANSI-compliant platform gets bitten by the completely non-standard behaviour of UNIQUE with respect to NULL values. Standards are there for a reason. I have no objection what so ever against vendor-specific EXTENSIONS to the standard - but keywords that are defined in the standard, should behave as described in the standard. Otherwise, you might just as well throw all standardisation overboard and start advertising SQL Server as "the first major RDBMS that does not adhere to any standard".
Posted by Microsoft on 1/29/2008 at 10:05 AM
Great news! The new filtered indexes feature supports this, e.g.:

create unique nonclustered index idx on dbo.DimCustomer(emailAddress)
where EmailAddress is not null;
Posted by Erland Sommarskog on 12/10/2007 at 3:17 PM
Really bad idea. A new type of constraint UNIQUE_WHEN_NOT_NULL could be added,
but there are plenty of situations when you want UNIQUE to work like it does today.
Posted by Michael Lato on 9/24/2007 at 2:11 PM
The July CTP of SQL Server 2008 allows for filtered indexes with the UNIQUE keyword. You can now create a UNIQUE index "WHERE x is not NULL" to perform this operation. Of course, we still have to deal with it in other ways in 2000 and 2005.
Posted by Mark Yudkin on 9/24/2007 at 4:53 AM
Another possibility if backwards compatibility must be maintained is to add a "SET ANSI_UNIQUE=ON|OFF" option.
Posted by Steve Kass on 9/22/2007 at 7:51 PM
Another possibility is for the behavior of UNIQUE to depend on the compatibility level of the database. I think this would cause less to break than to tie the behavior to ANSI_NULLS. Only a couple of compatibility levels are typically supported, so the old nonstandard behavior would autodeprecate and disappear with the 90 compatibility level.
Sign in to post a workaround.
Posted by Fabrice MARGUERIE on 9/11/2011 at 9:22 AM
As posted by Microsoft in the comments:

The new filtered indexes feature supports this, e.g.:

create unique nonclustered index idx on dbo.DimCustomer(emailAddress)
where EmailAddress is not null;
Posted by Alan Singfield on 8/26/2008 at 3:55 AM
CREATE TABLE [dbo].[UNIQUENESS](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [testing] [nvarchar](50) NULL,
    [testing_unique] AS (case when [testing] IS NULL then [id] end
),
CONSTRAINT [PK_UNIQUENESS] PRIMARY KEY CLUSTERED
(
    [id] ASC
)

GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_UNIQUENESS] ON [dbo].[UNIQUENESS]
(
    [testing] ASC,
    [testing_unique] ASC
)


The computed column testing_uniqueness will copy the identity field if the value of testing is NULL. This means that the tuple (testing, testing_uniqueness) will be unique for all values where testing is null.

This will give the desired effect of as many nulls as you like, but no duplicate non-null values.




Posted by Dejan on 5/28/2008 at 8:46 AM
http://www.tsql.ca/Default.aspx?tabid=234&EntryID=55
Posted by davidofmorris on 10/2/2007 at 10:31 AM
SQL Server 2005: How to have a Unique Constraint with multiple NULLS on a column
http://weblogs.sqlteam.com/mladenp/archive/2007/05/17/60209.aspx

UNIQUE Constraint vs NULL's
http://dimantdatabasesolutions.blogspot.com/2007/03/unique-constraint-vs-nulls.html