Home Dashboard Directory Help
Search

Change UNIQUE constraint to allow multiple NULL values by Hugo Kornelis


Status: 

Active


240
22
Sign in
to vote
Type: Suggestion
ID: 299229
Opened: 9/22/2007 1:31:50 PM
Access Restriction: Public
Duplicates: 498009
4
Workaround(s)
view

Description

According to the ANSI standards SQL:92, SQL:1999, and SQL:2003, a UNIQUE constraint should disallow duplicate non-NULL values, but allow multiple NULL values. SQL Server has always implemented a crippled version of this, allowing a single NULL but disallowing multiple NULL values.

Please start moving from the current non-standard implementation toward ANSI-standard behaviour for UNIQUE constraints.
Details
Sign in to post a comment.
Posted by Hugo Kornelis on 11/21/2013 at 2:07 PM
@David: I will go as far as to say that a design that needs this merits a second look. But saying that it is "almost certainly" a normalization problem is an oversimplification.

Example 1: A middle school teacher has all his/her students in a database. The primary key is student number. But of course, the normal access path for the teacher is through the first name of the student (or maybe class + first name). This is not guaranteed unique - so this teacher has the habit of assigning "personal" nicknames to at least all students with duplicate names. Some other students may have nicknames too, but not all. However, those that do have a nickname all have a unique nickname - otherwise, what is the point?

Storing the nickname as a nullable column in the Students table (assuming ANSI-standard implementation of the UNIQUE constraint) satisfies the rules of all normal forms except 6th (which has no nullable columns at all in the table design - and when implemented in SQL Server, kills all hopes of good performance).

I hope you will not argue that there should be a separate table (and entity type in the design) for "students_with_nicknames".


Example 2: In bookkeeping systems, an erroneous booking should never be deleted (auditors and accountants don't like that). Instead, when a mistake is made, a correction booking is made, for the same amount but with the sign reversed, effectively undoing the original booking. To track things, correction bookings (which otherwise are normal bookings) will include a reference to the original booking the reverse - an optional attribute (column) "is_correction_of". This column is optional (it's only used for correction bookings, not for normal bookings). But when used, it must be unique - it is not possible to correct the same booking twice, as the first correction has the net effect of effectively removing the incorrect booking. So here, too, I need a UNIQUE constraint that applies to the actual values only but does allow me to have multiple rows with is_correction_of NULL.

Again, not violating any normal form (except 6th). And I definitely do not want a separate entity type for correction bookings - they are normal bookings; they should be in the normal bookings table. That's how bookkeeping works.


I hope these examples convince you that neither my request, nor the ANSI standard definition of a UNIQUE constraint is dumb. There are definitely cases where this feature is needed. Luckily, we can "mimick" it by using other features (filtered indexes, indexed views) - but that was not the main reason for this request; the main reason is that I firmly believe that any relational database vendor should offer a mix of standard and proprietary features - but the proprietary features should use syntax that is different from what the standard defines (either really different or an extension of the standard). All language that has been given a meaning by the standard should either work exactly as described in the standard, or simply raise an error if the vendor has not implemented that feature. Language defined in the standard should never run, but produce different results.

(Compare it to cars. Some cars choose to implement the "manual gear" feature, by adding a clutch pedal and a stick; others implement the "automatic gear" feature by adding some devices internally. But if a car has manual gears, the clutch pedal is always to the left of the brake (in right-side driving countries; I have no idea how this works in England). No car manufacturer will ever consider releasing a car with the clutch pedal between the gas and the brake - that would cause confusion, and hence accidents. If a car has three pedals, drivers can expect them to be arranged in the standard order.
Posted by David Turner on 11/21/2013 at 5:25 AM
This is a pretty dumb request. If you want this, you almost certainly have a normalization problem. What you are essentially saying is that "An X doesn't always have a Y, but when the Y exists it is unique and belongs to this X only". The uniqueness of elements in Y strongly implies that it is a set in itself with an injection onto X. IMHO the answer is to put Y in a separate table with a unique constraint, e.g.

create table Y (X_id references X(id) primary key, Y_value unique)

(or you could do the primary key the other way around).
Posted by colinbashbash on 6/18/2013 at 2:36 PM
According to the spec, it's not valid to have the "Many Nulls - some null" that I questioned below. Sorry for the confusion, there.
Posted by colinbashbash on 6/18/2013 at 2:28 PM
I've found times where both ways are useful. It may be useful to provide a One Null/Many Nulls sub-type to a unique constraint. I think it's important to look at 1) the effects on unique keys over single vs multiple columns, 2) how this would affect upgrading databases, and 3) how this would affect applications that need to be redistributed to different database vendors dependent on a client's preference.

1) Single & Unique keys
Unique key over a single column:
Many Nulls - useful: You may want to say that people have to have a unique email, but leave their email address null until their account is approved.
One Null - (maybe useful): You may want to specify that there's a distinct external reference identifier for a column, and that only one of the table rows can be linked to an external identifier. (I would think that as NULL means nothing, empty string would be a better value. I'm sure there are other similar uses)

Unique key over multiple columns:
==Assume you columns (a,b) that are in your unique key
--One Null - useful - If Column A is not nullable, then allowing column B to have one null value for each value of A can be very useful. (Table Access (PK_AccessId, ParentId, GroupId not null, SubGroupId null)). Another example would be putting a unique constraint over UserName & DeleteDate, so that you can only have 1 undeleted user and multiple deleted users with the same UserName,
--Many Nulls - all null - I can see where you may want to allow multiple rows where all A and B are null.
--Many Nulls - some null - It's harder to see it being useful to have multiple null values of B for a distinct A. If you were going to implement Many Nulls, I would hope that it would only allow multiple nulls if everything is null and expect users to use filtered indexes to accomplish something different. (What does the spec say on this?)

2) Compatibility
-- One option to preserve compatibility for existing Sql Server databases would be to have the conversion tool set each unique key to One Null by default. It could set it up so that it throws an error if you don't specify One or Many Nulls when creating a unique constraint on the newer database.

3) Redistributed databases
-- This is already complicated enough, since stored procs aren't part of the standards. It would be preferable if SQL Server would follow the Specs for Unique constraints, or throw an error to alert the developer if a small tweak is needed, rather than just apply a non-standard way of doing things. If needed, it could probably be updated eventually to default to the standard behavior, or a future standard could be updated to allow for the additional specification of one/many nulls.

(I'm not voting Yea or Nay on this, because I think something should be done, but the idea of breaking existing designs could increase costs for existing sql server customers)
Posted by The Jams on 12/27/2012 at 10:23 AM
Better compliance of all current standards would be to not allow the insert of any duplicate values to waste table space when a UNIQUE CONSTRAINT (not UNIQUE INDEX) is in place. NULL or NOT NULL duplicates are always wastefull and can create several bugs when not expected in any data structure.
If there is a UNIQUE constraint on a TABLE and 100 rows of NULL values can be inserted what standard is that compliant for?
Voting no to this backwards and contrary request. Can't understand why MS has not killed this connect after 6 years and updates to standards that make it obsolete.
Posted by Hugo Kornelis on 12/21/2012 at 1:49 AM
After reading the comment posted by Tom, I feel compelled to point out that my main reason for proposing this change is better ANSI-compliance. Some people (far less than the standards committee would like, but more than zero) have to build code that supports multiple platforms. More people change jobs, or stay at a job when their employer switches to another database platform.

They *will* get into trouble over non-standard extensions not working on the new platform, but that's okay-ish. I see that vendors need to distinguish themselves, and that they often can't wait for the standardisation committtee to catch up.
They *will* also get into trouble over standard features not implemented on the new platform. Less okay-ish, but still acceptable - I get that implementing more features is a trade-off.
I am okay-ish with all of those because, whenever a feature is not implemented on the new platform, they simply get an error message. A very usesful way to tell them to look for a different implementation.

What I have a huge problem with, is taking a feature from the standard, and then implementing it just a tiny bit different. People coming from Oracle (e.g.) and knowing that UNIQUE is an ANSI feature will just use it in SQL Server, and when they don't get a syntax error, they'll expect it to work correctly. But it doesn't - at least not according to their (and the ANSI) idea of "correct". That introduces bugs that hopefully are discovered during development and QA and cause "only" a few days of lost work - but in theory could make it all the way into production and cause huge errors later.

Suggestion to the Microsoft team - if lots of people think the non-standard implementation of UNIQUE is valuable and they want to keep it (even though there are workarounds for it), then just implement only the first two steps of my suggestion. I have no problem with having an optional keyword attached to UNIQUE constraints to keep the current behaviour - as long as it defaults to ANSI-standard behavious (eventually - again, I do not propose to change this at once, I propose to do it gradually and give people time to change their code).
Posted by Tom.Thomson on 12/20/2012 at 8:46 AM
Since it's trivial to achieve the type of constraint suggested by this entry without any changes at all to SQL server (a perfectly good method using filtered indices was pointed out 63 months ago, and another - perhaps less satisfactory- workaround was suggested a few monthe later), I can only see this as an attempt to throw away a useful feature in order to conform with a standard. Unfortunately, the standard is pretty poor - it has adopted an untenable approach to NULL in UNIQUE constraints - banning NULLs altogether for columns involved in UNIQUE constraints is another untenable approach (one that I've suggested pushing for in standards discussions so that maybe a compromise between than and the currently diametrically opposed untenable position in the standard will lead to something tenable).

Clearly a lot of people don't understand what is actually implemented in SQL Server for Unique Constraints (perhaps they've been reading and believein Books OnLine, which could certainly encourage such misunderstanding about this feature). What SQL Server provides is a UNIQUE constraint that guarantees that there is no possible way to assign values to the NULLs in a set of rows that satisfies the constraint that would deliver a set of rows that doesn't satisfy it, even if the constraint is completely disregarded for the purpose of assigning values to those NULLs. That is enormously valuable (as Erland Sommarskog pointed out 5 years ago). Let's not throw it away in return for no new capability at all.
Posted by beneuto on 9/20/2012 at 6:22 PM
NULL fields mean you dont always have a value so putting a unique index on a nullable field is saying, "We dont always have a value for that field but when we do it must be unique". This is much more likely than saying, "There will only ever be one record that we dont have a value for" which is what the current behaviour supports. Therefore I doubt that many applications would be impacted by changing to the ANSI standard in this case.
Posted by serbat on 7/26/2012 at 8:45 AM
Me too, that´s why a standard is defined.
Posted by Kanalpiroge on 9/27/2011 at 7:14 AM
I totally agree! Please implement the standard!!!
Posted by Shanus371 on 8/19/2011 at 12:14 PM
MS should implement the standard. Agreed 100%. For those saying it doesn't make sense, it only seems to show the lack of understanding of what NULL is. It's not string.Empty folks. NULL is nothing, not allocated, NULL does not equal NULL , NULL is not something. As being nothing means it cannot be compared to anything, even nothing, any field with a unique constraint and allow nulls should allow an unlimited number of fields to be NULL. That is proper behavior and it's not even really something up for debate. That is the way it should work.
Posted by Oliver Wilcock on 7/12/2011 at 5:23 PM
MS should implement the standard and quit wasting my time.
Posted by David Portas on 4/11/2011 at 2:21 AM
Added an alternative suggestion here: http://connect.microsoft.com/SQLServer/feedback/details/658638/dont-allow-nullable-columns-as-part-of-unique-constraints
Posted by Tom Thomson on 4/7/2011 at 12:08 PM
I think this is an appalling idea. We should instead be working to get the standards organisations to ghange the standard so that a UNIQUE column (or any column in a group of columns which together are unique) does not permit null, and planning to change SQL server to have the option to forbid NULL in any such column instead of allowing a single null.
Posted by Hugo Kornelis on 12/2/2010 at 2:44 AM
I have edited the proposed solution. Implementing the first step in SQL Server 2008, as originally suggested, is of course no longer possible. But I hope it's still possible to add this in SQL Server 2011.
Posted by Stan Segers on 7/21/2009 at 3:50 PM
Quoting SQL Server 2008 Books Online, SET ANSI_NULLS (Transact-SQL) :

[start quote]
Important:
In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
[end quote]

Does future mean next and does the NULL in a UNIQUE index mean NULL as in ANSI_NULLS?

Anyway, Hugo's proposed solution is one I can greatly accept. It doesn't break old code in this version and it hardens new code for the future... can we get it into the list of fixes for SQL Server 2005 SP4, SQL Server 2008 SP2 and the product release for Kilimanjaro/SQL Server 2008 R2?
Posted by DMulvena on 7/17/2008 at 2:12 AM
I agree that Microsoft SQL should follow the standard. NULL is a concept rather than a fixed value, and allowing only one row to have this, as if it were a fixed value, is illogical. (Equally illogical in my opinion is including all the records with null columns in an index - what is the point of that?) I can't agree with the suggestion though of overloading the existing ANSI_NULLS option.
Posted by Dragba on 6/12/2008 at 9:15 AM
I always had a problem with unique CONSTRAINT that allows multiple NULLs (or NULLs at all). However, I always missed unique INDEX that allows multiple NULLs. I'm pretty happy with MS solution (what took you so long?). Mind you that you can create a foreign key to the unique constraint. Doing that to the alternate key with multiple NULLs (or NULLs at all) makes no relational sense to me.
I understand the importance of standards, even when they are wrong, so MS could do something about it, keeping this behaviour default and recommended.
Posted by FluidTrade Support on 2/23/2008 at 10:00 AM
I need to add my voice to the list of people that think this is a big issue. The problem for me is less the compatibility with other databases than the compatibility with ADO.NET. The Unique constraints created for a DataSet allow multiple nulls but enforce uniqueness with the non-null values. I'm creating a middle tier where the in-memory functions have analogous functions on the persistent (read SQL) store. Currently, I can't create a constraint on SQL Server that works the same as the Unique constraint in my ADO.NET DataSet.
Posted by Hugo Kornelis on 1/29/2008 at 3:10 PM
You closed the suggestion as "fixed" after I posted my comment, but I felt compelled to re-open it. The suggestion was NOT to implement "some way" to constrain non-NULL values to be unique. The suggestion was to (gradually) change the UNIQUE keyword to behave as it is defined in the ANSI / ISO standards SQL-92, SQL:1999, SQL:2003, and probably some others as well.
Posted by Hugo Kornelis on 1/29/2008 at 10:49 AM
Erland: If you have indeed encountered situations where a UNIQUE constraint that permits a single NULL but not more is useful, then I'd prefer it if the non-standard constraint was given a proprietary name (eg UNIQUE_ONE_NULL_ONLY), and the name as defined in the standard (UNIQUE) behaves as described in the standard. That is, after all, what standardisation is all about!
Posted by Hugo Kornelis on 1/29/2008 at 10:47 AM
As glad as I am with the filtered index feature, I don't consider this a solution to my suggestion.

What I asked is to make UNIQUE behave as defined in the standard. The filtered index option is now presented as an argument for not having to come up to standards.

Every developer who comes from an ANSI-compliant platform gets bitten by the completely non-standard behaviour of UNIQUE with respect to NULL values. Standards are there for a reason. I have no objection what so ever against vendor-specific EXTENSIONS to the standard - but keywords that are defined in the standard, should behave as described in the standard. Otherwise, you might just as well throw all standardisation overboard and start advertising SQL Server as "the first major RDBMS that does not adhere to any standard".
Posted by Microsoft on 1/29/2008 at 10:05 AM
Great news! The new filtered indexes feature supports this, e.g.:

create unique nonclustered index idx on dbo.DimCustomer(emailAddress)
where EmailAddress is not null;
Posted by Erland Sommarskog on 12/10/2007 at 3:17 PM
Really bad idea. A new type of constraint UNIQUE_WHEN_NOT_NULL could be added,
but there are plenty of situations when you want UNIQUE to work like it does today.
Posted by Michael Lato on 9/24/2007 at 2:11 PM
The July CTP of SQL Server 2008 allows for filtered indexes with the UNIQUE keyword. You can now create a UNIQUE index "WHERE x is not NULL" to perform this operation. Of course, we still have to deal with it in other ways in 2000 and 2005.
Posted by Mark Yudkin on 9/24/2007 at 4:53 AM
Another possibility if backwards compatibility must be maintained is to add a "SET ANSI_UNIQUE=ON|OFF" option.
Posted by Steve Kass on 9/22/2007 at 7:51 PM
Another possibility is for the behavior of UNIQUE to depend on the compatibility level of the database. I think this would cause less to break than to tie the behavior to ANSI_NULLS. Only a couple of compatibility levels are typically supported, so the old nonstandard behavior would autodeprecate and disappear with the 90 compatibility level.
Sign in to post a workaround.
Posted by Fabrice MARGUERIE on 9/11/2011 at 9:22 AM
As posted by Microsoft in the comments:

The new filtered indexes feature supports this, e.g.:

create unique nonclustered index idx on dbo.DimCustomer(emailAddress)
where EmailAddress is not null;
Posted by Alan Singfield on 8/26/2008 at 3:55 AM
CREATE TABLE [dbo].[UNIQUENESS](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [testing] [nvarchar](50) NULL,
    [testing_unique] AS (case when [testing] IS NULL then [id] end
),
CONSTRAINT [PK_UNIQUENESS] PRIMARY KEY CLUSTERED
(
    [id] ASC
)

GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_UNIQUENESS] ON [dbo].[UNIQUENESS]
(
    [testing] ASC,
    [testing_unique] ASC
)


The computed column testing_uniqueness will copy the identity field if the value of testing is NULL. This means that the tuple (testing, testing_uniqueness) will be unique for all values where testing is null.

This will give the desired effect of as many nulls as you like, but no duplicate non-null values.




Posted by Dejan on 5/28/2008 at 8:46 AM
http://www.tsql.ca/Default.aspx?tabid=234&EntryID=55
Posted by davidofmorris on 10/2/2007 at 10:31 AM
SQL Server 2005: How to have a Unique Constraint with multiple NULLS on a column
http://weblogs.sqlteam.com/mladenp/archive/2007/05/17/60209.aspx

UNIQUE Constraint vs NULL's
http://dimantdatabasesolutions.blogspot.com/2007/03/unique-constraint-vs-nulls.html