Home Dashboard Directory Help
Search

new virtual table: errors. It would analogous to the deleted and inserted tables by danholmes


Status: 

Active


558
0
Sign in
to vote
Type: Suggestion
ID: 774754
Opened: 12/19/2012 8:39:15 AM
Access Restriction: Public
0
Workaround(s)
view

Description

If a constraint violation happens in a DML statement and the input was a dataset, the offending data in the source is difficult to find. The statement fails and the datasource has to be searched (and possibly recreated) and checked for the violation.

Thread from the forum.
http://social.technet.microsoft.com/Forums/en-US/transactsql/thread/3e17f8dc-9685-412b-8e76-94ad41536d5d
Details
Sign in to post a comment.
Posted by pmpjr on 4/10/2014 at 8:35 AM
please do this, it would make my life (and others) so much better!
Posted by Jeff Moden on 4/9/2014 at 7:09 PM
How could such a feature be used? Let me count the ways! This is simply a brilliant idea and it's long overdue. Great suggestion, Mr. Holmes.
Posted by jbsys1 on 4/9/2014 at 8:34 AM
Microsoft MUST consider this common enough to be worth doing - a very similar feature has been implemented in MS Access for years with the Import_Errors table. Maybe that is the workaround they expect us to use!!!
Posted by cehottle on 4/9/2014 at 8:09 AM
Do it.
Posted by yes please on 4/8/2014 at 4:21 PM
The hours this would save annually is large.

Do it.
Posted by Robert Heinig II on 10/16/2013 at 6:22 AM
People, let's keep some perspective.
(1) ETL, Minority Rows violating the target schema, etc are the classic domain of SSIS using cascaded batch/single destination transforms. SSIS will not perform much worse than hypothetical DML using this suggestion (if at all - buffer pool spamming may lead to SSIS performing better in some cases? Haven't thought that one through), and memory usage may be easier to handle. True, there may be large numbers of developers with good SQL skills who wouldn't touch SSIS, but IMHO that's not fault of the product itself.
(2) The inadequacy of the error messages seems to me to be one of the major driving forces here. If that infamous 'string or binary' wouldn't still after decades (I seem to remember heated discussions when 7.0 was on the box) fail to identify the triggering field or data, we all on average would feel far less strongly about this suggestion, right? The PK violation message got improved and now is perfecty useful, right?
(3) Methinks this is mostly about usability. Let me say this: SQL Server DB team, even if you manage to ignore and annoy everyone commenting here and do nothing - The SQL Server core DB engine is still an amazing product and offers by far the best usability of the modules in the whole SQL Server package. Good work, keep it up :)

Bottom line, I am still very much in favour, brilliant idea, I think usecases are NOT limited to ETL and one-off development tasks, and... I would already be happy about just some error messaging improvements.
Posted by dnoeth on 9/15/2013 at 7:43 AM
Of course this is a common scenario in any DWH environment.

A similar feature exists in Oracle (as already mentioned multiple times) but also in Teradata: CREATE ERROR TABLE FOR tab creates a copy of the table plus some additional columns indicating which column caused which error. Any subsequent INSERT/SELECT or MERGE can add LOGGING ERRORS and violating rows are inserted into this error table (together with a unique identifier for this DML statement). By default all valid rows are commited, but there's an optional LIMIT to stop after x (>= 0) errors and rollback the entire statement.
Posted by Dilip D on 9/2/2013 at 8:19 PM
Microsoft should add this feature as it very difficult to find out the row which causing error.Currently i am facing this issue while importing millions of data to SQL server and manually i have to check which row is causing error. Any workaround?
Posted by scottmonte on 8/16/2013 at 8:05 AM
I frequently run into this scenario while importing text files (IIS LOGS) into a work table to analyze, and have also seen this issue in our other ETL processes. I frequently have to use workarounds like set rowcount and select top to find the offending data. This would be great and is not uncommon in our production system.
Posted by David C Walls on 7/4/2013 at 8:43 AM
I would consider the suggested feature to be a performance enhancement even more than a development aid. Dealing with constraint violations during ETL processes is, without a doubt, pervasive, if not ubiquitous. In my experience, if data from a particular source has to be moved more than once, it is standard procedure to create a process, e.g. using SSIS, in order to handle the exceptions. While many such processes would still be needed if this feature were available, in many other cases, the process could made much simpler - and much, much faster - by being able to note the exceptions directly from the DML statement. I can think of a number of examples in which I would have been able to use a generic SQL batch rather than an SSIS package because the only reason for the package was to handle (and often just record so they could be handled by a human) the exceptions.

It is hard to imagine that extracting exceptions directly from DML wouldn't be significantly faster than locally-written code that often processes on a row-by-row basis.
Posted by jerrydba1 on 7/1/2013 at 2:28 PM
Dear Mr. Apostol,

When you said, "The scenario reported in the bug are not common enough," did you mean that ETL processes aren't common enough, or that constraint/data type violations aren't common enough?

Thanks!

-jerry
Posted by boumerlin on 6/13/2013 at 3:29 PM
I agree...this would be a nice feature to have with set logic! There is nothing more irritating than searching for one lone row with bad data!
Posted by Marian Chicu on 5/30/2013 at 4:56 AM
This will surely be useful for every data import project that we have. Nobody wants to guess what the problems are, we'd really be helped in this case by knowing for sure what are the problematic rows.
So, please, do reconsider this!
Posted by Mick Letofsky on 5/28/2013 at 10:32 AM
Also, Oracle has had this feature for almost a decade. The feature is called DML Error Logging and you use a statement called "LOG ERRORS" which allows the user to export any number of rows containing errors to another location (preferably another table) during DML statements.

Oracle obviously believes that this feature wasn't that big of a deal to add to their baseline.
Posted by Mick Letofsky on 5/28/2013 at 10:13 AM
Dear Gus et. al,

The ability to selectively export rows that failed into a secondary table or event flat file would be a great benefit for various software development companies that use SQL Server as a back-end database because this would allow for faster on-boarding of clients when converting data from non-SQL Server databases and/or flat files. Reducing the time for on-boarding new clients is crucial for SMB of the world to succeed and features such as this cannot be ignored with a simple, blanket statement such as "because the scenario reported in the bug are not common enough".

The SMB community fully understands the risks of implementing changes into a baseline function. However, has the SQL team considered implementing this functionality within the new MERGE statement? Perhaps that is a better place to extend this crucial and highly desired functionality.

Thanks,
Mick
Posted by Jason Weber - Magenic on 5/28/2013 at 7:40 AM
Although incredibly valuable (Oracle already does something like this, right?) this feature probably also needs an on/off switch so that people can elect into the new behavior or not as appropriate. I'm hoping the "not common enough" refers to doing this without this switch, which was not explicitly mentioned in the original request.

Fantastic idea! This seems like a no-brainer and would help sell against Oracle.
Posted by davidc5 on 5/28/2013 at 3:12 AM
I run into this several times EACH MONTH. It needs to be fixed!

Closing this as not common enough is a really bad joke.
Posted by Glenn Mafodda on 5/26/2013 at 9:23 PM
This is an excellent suggestion. Perhaps it was initially rejected as the primary benefit was listed as "faster development". I would suggest that the primary benefit of this would be to enhance the robustness of the product by providing accurate error reporting rather than generic messages. The issue is extremely common.
Posted by MA AccessGroup.it on 5/26/2013 at 1:06 PM
We all are waiting for this feature
Posted by MauriD on 5/26/2013 at 10:22 AM
That would be a great and useful feature. A must have in a enterprise product like SQL Server.
Posted by Lazydba247 on 5/25/2013 at 11:08 PM
MS please reconsider, this feature will enable some SSIS Logic / feature in our TSQL ETL code. no need to search for the "needle in the haystack" :) Thanks.
Posted by Marco Russo - SQLBI on 5/24/2013 at 1:53 PM
I agree, this feature is useful, in many real world scenario. Please, reconsider it!
Posted by Maria Zakourdaev on 5/24/2013 at 1:31 PM
Absolutely agree, this would be extremely useful feature. Would save us a lot of hours of work. Microsoft, please reconsider.
Posted by donnam9999 on 5/23/2013 at 8:03 AM
This happens very frequently and the man-hours spent to fix the problems are considerable!
Posted by K Sean on 5/20/2013 at 8:44 AM
Happens with me very frequently (or may be I work with bad data). Combing through the input data to find the offending record, is always the pain especially when you have millions records incoming.
Posted by Derek Frye on 5/17/2013 at 10:29 AM
Implementing this would help save our company time when importing "less than clean" data. The time spent working around this is a factor in our choice of database systems...
Posted by Michael Decuir on 5/17/2013 at 10:01 AM
I routinely encounter errors where I have a constraint violation of some sort, but have no practical means of recreating the data set that triggered the constraint violation. As many people have noted previously, this kind of solution seems consistent with established patterns and would provide a huge amount of value when troubleshooting.
Posted by ConnieOI on 5/17/2013 at 4:47 AM
This is a VERY common problem -- I encounter it regularly. Please implement a proper solution rather than just letting the statement fail.
Posted by Mike Lewis (SQLTuna) on 5/17/2013 at 12:46 AM
I think the idea is excellent and warrants time and effort to implement. It would be a huge improvement and step forward in DML TSQL and performance-wise would allow batch processing but still isolate the error records.
Posted by Anders Borum on 5/16/2013 at 11:06 PM
I actually ran into this yesterday and it's a scenario that creeps up on me on a regular basis - as is the case with all authors posting comments here. The proposed solution is generic (common) and builds on already established conventions in the DML; it proposes a clean solution to a problem that occurs with a high frequency to many developers.

The fact that Microsoft has closed this as "not common enough" just shows how far the SQL Server (and SQL Azure) development teams have drifted from actually truly caring about developers. Although many of the development teams at Microsoft have become more transparent (and responsive to community requests) in recent years, it seems the SQL Server (and SQL Azure) development teams are still quite oblivious to constructive feedback.

I have to ask the provoking question whether SQL Server is actually a customer-driven product or more of an internal product they build to facilitate software suites sold (and developed) by other divisions. I know of numerous connect entries with bugs for e.g. SQL Azure that have been open for years (http://connect.microsoft.com/SQLServer/feedback/details/624093/sqlbulkcopy-writetoserver-fails-on-sql-azure-if-destination-table-is-temporary).

Don't ignore constructive feedback from your loyal developers; please consider more transparency and how your decisions affect the lives of thousands of developers on a daily basis.
Posted by Matt Velic on 5/16/2013 at 2:26 PM
Please do not close as "not common enough." This is an issue that I run into a couple times a month, and the proposed solution sounds fantastic and would fit into an established pattern within the product. This is a good idea.
Posted by ValeriyNY on 5/16/2013 at 12:09 PM
This would save my team hours if not days in troubleshooting and development of workarounds
Posted by Jace on 5/16/2013 at 9:44 AM
I've run into this multiple times. It needs to be fixed!
Posted by David M Maxwell on 5/16/2013 at 8:18 AM
This is absolutely a common issue. I run into it regularly when dealing with data from outside sources. (On average, 2 - 3 times per month.)
Posted by Steven Durette on 5/16/2013 at 7:50 AM
How can this not be "common" enough. Getting errors that the data was not inserted because of string length, invalid date, out of bounds (using a number to large for the type ex. 1000 into a tinyint) happens all the time to everyone. When you are dealing with 50+ columns and millions of rows it is a major pain when the error doesn't even tell you which constraint or even which field was at issue. Give me an error that at least tells me which column would drastically reduce my troubleshooting time! Give me the offending data row in the error as well would let me quickly decide if the data is bad or if a schema change is needed.
Posted by LeoPasta on 5/16/2013 at 1:40 AM
Not common enough? Is there any database developer who didn't encounter "String or binary data would be truncated" in his life?

What is the point of creating/maintaining a feedback website if the next thing you do is rushing away closing every suggestion as "won't fix" or "not common enough"?

If a suggestion is deemed reasonable (as this one surely should) but couldn't be included in this version then please, just leave it open to be considered in the next one! Or else it becomes a whack-a-mole game: You close the ticket and a variation of it will reappear within the week!
Posted by Carlos Sacristan on 5/16/2013 at 12:55 AM
Absolutely agree with my fellows. Say that this escenario is "not common enough" does not make sense!!!
Posted by Pablo Álvarez Doval on 5/16/2013 at 12:24 AM
This would be an amazing feature, and extremely useful for many of my projects.

I'd like to believe that the 'not common enough scenario' response was just a polite generic answer; I simply refuse to believe the guys at the team really think this is an uncommon scenario, becasue that would mean the really know very little about how their customers are using their product!
Posted by IanYates [Med IT] on 5/15/2013 at 4:40 PM
The various ways this would affect the query pipeline and so on would make it difficult to implement and, I imagine, may touch upon a lot of areas within SQL Server. We may have the odd bug in the first version much like how the merge statement has had its fair share of issues, but it's a REALLY good idea and would be a really compelling addition to T-SQL.

The only problem I see with it (selfishly!) is that I couldn't use it in my own apps for a while since we have users who just don't upgrade their installed copy of SQL all that often.
Posted by Shawn Quillman on 5/15/2013 at 2:28 PM
This would save incredible amounts of time. I can't in any scenario think of how this is not a common situation. I encounter this problem regularly and have at time spent hours debugging. Knowing all of the data that would cause the operation to fail would be of immense benefit.
Posted by David M B on 5/15/2013 at 1:04 PM
This feature is called DML Error Logging in Oracle and has existed since 10gR2. http://www.oracle.com/technetwork/issue-archive/2006/06-mar/o26performance-096310.html
This is an amazing feature that really would be great to have in SQL Server.
Posted by Andomar_ on 5/15/2013 at 12:33 PM
Not common enough?

I hope that's an attempt at humor.
Posted by John Hennesey on 5/15/2013 at 12:18 PM
This feature would be extremely useful and welcomed! The cumulative time this would have saved everyone... Please, take this one seriously. This would be a boon to data development work.
Posted by TheBeardedLlama on 5/15/2013 at 11:11 AM
not common enough?

are you serious?

you should take a leaf out of the ASP.net team and the excellent work they're doing listening to the devs
Posted by jjdDb on 5/15/2013 at 11:10 AM
This would be a very handy
thing to have. thanks.
Posted by Eric Hutton on 5/15/2013 at 10:42 AM
This is absolutely needed. I have to resort to dumping files out to text editors and manually searching for possible culprits.
Posted by Peter A. Schott on 5/15/2013 at 10:21 AM
I have to agree with others on this one. This is quite common when trying to manually handle inserts and updates. All we know is that _something_ caused a violation, but no help on what that something was. We then have to spend much more time tracking down the problem data instead of getting some helpful way to get more information about the problem. Yes, the case described shouldn't be common when working with an OLTP system with a well-designed interface and data-scrubbing, but it's quite common for ETL and ad-hoc operations.
Posted by Eric Wisdahl on 5/15/2013 at 10:07 AM
This is a great suggestion (not a bug) to handle the extremely common occurrence of data exceptions during set based manipulation and I believe should really be looked at for implementation. Perhaps, have a database / server setting for handle exceptions (or something similar) that would fail the batch as currently implemented or throw records into the virtual exception table as discussed below.

Posted by Meher on 5/15/2013 at 10:06 AM
This feature would definitely help. I request Microsoft to please consider this adding in the product.

Thanks
Meher
Posted by Argenis Fernandez on 5/15/2013 at 10:04 AM
This would be an INCREDIBLY useful feature. I see widespread applications for this. MSFT, please be a little more open-minded about this suggestion.
Posted by Jason Kyle on 5/15/2013 at 10:00 AM
This would be hugely helpful. I also work with large datasets and moving data between tables regularly, and especially with very wide tables tracking down the error can be very problematic.
Posted by Adam Machanic on 5/15/2013 at 9:22 AM
Gus: As someone who focuses on data warehousing, I see use cases for this feature on a daily basis. And I regularly deal with problems caused by bad data, which can't be caught and handled nicely in T-SQL.

If properly implemented I would use this in every single ETL process, as I'm sure would the vast majority of other data warehouse developers. The scenarios here are extremely common and very much in need of a fix.

Please reconsider.
Posted by Erland Sommarskog on 5/14/2013 at 3:13 PM
Not common enough? Come on! Sure, this is not a small order, and it will require some effort to get this right. But if you do, it will be revolutionary. Not common enough? Bah!
Posted by Microsoft on 4/29/2013 at 2:40 PM
Hello,

Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server. The reasons for closing this bug is because the scenario reported in the bug are not common enough and due to the risk of implementing a fix it unfortunately does not meet the bar for the current version of the product.

If you have more supporting information and details or feel strongly that we should reconsider, please do resubmit this item for review.

Thanks again for reporting the product issue and continued support in improving our product.
Gus Apostol, SQL Server Program Manager
Posted by Microsoft on 1/16/2013 at 1:46 PM
Thanks for great feedback, we will evaluate this request for future releases.
Posted by Simon Sabin on 12/31/2012 at 3:37 PM
I think this is a fantastic idea and would have said my team months of effort in build cross system ETL for a migration.
I've put my thoughts in a blog post http://sqlblogcasts.com/blogs/simons/archive/2012/12/31/help-to-improve-sql-batch-operations-and-etl.aspx

I discuss what errors should be logged and at what level.
Posted by Erland Sommarskog on 12/20/2012 at 3:12 PM
Bob, aren't you thinking of a different thing? Like an error_table() function to supplement the current error_message() etc which are scalar and that would only return data in a CATCH handler.

I think for a DML statement with an OUTPUT of errors (or a special clause) errors would not be raised and trigger the CATCH handler, but instead end up in the table with one row per failed row. Presumably errors would need to have a standard appearance with one xml column for the data and one for the errors - there may be multiple errors with the same row.

Yes, this is a interesting feature, if it's done right, but it requires some thinking to get it right.
Posted by Rob Volk1 on 12/20/2012 at 1:55 PM
I agree wholeheartedly with Bob's suggestion.

Adam:

Perhaps they're not "errors" if IGNORE_DUP_KEY is set, but such gaps would occur for other violations. This new table should represent everything not inserted from a single statement regardless of why, for the sake of atomicity. It's easier to filter out a particular violation like IGNORE_DUP_KEY if one doesn't care, but to ignore them entirely would make a full ETL reconciliation more difficult.

Regarding the type coercion failure you mention, an $error directive that returns XML with additional source type would be trivial to implement (and TRY_CONVERT() is available). The affected column in the errors virtual table could be set to NULL with supplementary $error info, or worst case returned as sql_variant.
Posted by Bob Beauchemin on 12/20/2012 at 1:41 PM
I'd want this for all T-SQL statements, not just DML. All database APIs except for T-SQL can harvest a collection of errors, rather than just a single error. This table should include all properties (e.g. error level) so informational errors (e.g. level 0 errors) could be filtered out. If so, they could be included.
Posted by Adam Machanic on 12/20/2012 at 1:41 PM
"The statement would still fail as it currently does but the offending row(s) would be salvaged for postmortem analysis."

I disagree here. I actually think that would make the situation confusing and difficult. What I'd much rather have happen is that the rows that don't hit a violation will go into the table, and the rows that do hit a violation will end up in the errors virtual table. This puts things in the user's hands. If you want to roll back, roll back (of course you'll need an explicit transaction for that). If you want to commit, commit. Either way, it's one step rather than two, and turns this feature into a wonderful way of setting up a "data quarantine" in T-SQL -- something that we usually need SSIS or some other external ETL tool to achieve.
Posted by Rob Volk1 on 12/20/2012 at 1:34 PM
Crap, hit submit accidentally.

It would also be nice for the errors virtual table to include identity and sequence values of failed rows, which appears would be automatic. Gaps due to IGNORE_DUP_KEY repro below:

create table #z(i int not null identity(1,1),
    b int not null primary key with (ignore_dup_key=on));
insert #z(b) values(1),(1),(2),(2),(3);
select * from #z;
drop table #z;
go

create sequence s0 as int start with 1 increment by 1 no cycle;
go
create table #z(i int not null default next value for s0,
    b int not null primary key with (ignore_dup_key=on));
insert #z(b) values(1),(1),(2),(2),(3);
select * from #z;
drop table #z;drop sequence s0;

Another suggestion: instead of adding a column to the virtual table listing the violated constraint, perhaps add an $error directive similar to the $action available with the OUTPUT clause of the MERGE statement. That would preserve the original table structure, just like inserted and deleted. And if possible, when multiple violations occur for the same row, $error would list them all, either as XML or CSV.
Posted by Adam Machanic on 12/20/2012 at 1:24 PM
Of course the last comment illustrates another point about the difficulty of implementing this. In the case I've pasted, what's the output of the virtual table? [i] can't be typed as INT if you want to see the row that you were trying to insert; doing that would create an unworkable situation.
Posted by Adam Machanic on 12/20/2012 at 1:22 PM
"It could also include a column for the constraint name that was violated."

An error may be caused by something other than a constraint violation, e.g. a data type violation:

DECLARE @t TABLE (i INT)
INSERT @t SELECT 'abc'

... so I'd rather the virtual table have a column with the actual error message. (Which would include the constraint name anyway.) And maybe a column for the error number.
Posted by Adam Machanic on 12/20/2012 at 1:17 PM
@Rob: No, please *do not* include those rows! If I've added IGNORE_DUP_KEY it's because I want to ignore them. They're NOT errors. (Okay, now we can see how difficult this feature would be to implement...)
Posted by Rob Volk1 on 12/20/2012 at 1:14 PM
Oh god yes. If you do this please be sure to include rows not inserted due to IGNORE_DUP_KEY settings on indexes and constraints.
Sign in to post a workaround.