Search

new virtual table: errors. It would analogous to the deleted and inserted tables by danholmes

Active

287
0
Sign in
to vote
Type: Suggestion
ID: 774754
Opened: 12/19/2012 8:39:15 AM
Access Restriction: Public
0
Workaround(s)
If a constraint violation happens in a DML statement and the input was a dataset, the offending data in the source is difficult to find. The statement fails and the datasource has to be searched (and possibly recreated) and checked for the violation.

Thread from the forum.
http://social.technet.microsoft.com/Forums/en-US/transactsql/thread/3e17f8dc-9685-412b-8e76-94ad41536d5d
Details (expand)

Product Language

English

Category

SQL Engine

Proposed Solution

I am proposing a new virtual table that would be analogous to the deleted and inserted tables that are commonly used in the OUTPUT clause. This table would be structurally identical to the source data. It could also include a column for the constraint name that was violated.

now it it is a table it would then be saved as part of an OUTPUT clause. perhaps something like :
OUTPUT ERRORS errors.* INTO ...
or
INSERT ...
OUTPUT inserted.* INTO ...
OUTPUT errors.* INTO...
SELECT...

The statement would still fail as it currently does but the offending row(s) would be salvaged for postmortem analysis. There could even be options that would change the statement's ACID properties to perhaps output the failed row but not fail the statement. that would mimic the behavior of the programmer cursoring over the source and saving those that can be but logging those that don't. It should be in the programmer's control with the default behavior to be as it is.

As one of the TSQL examples above shows, this change would also require the ability ot have multiple OUTPUT clauses. I would like that too (in addition to having an OUTPUT clause with a predicate - that too would be great. Then i could only get the stuff in the OUTPUTed table that i am interested in).

Primary Benefit

Faster Development

Other Benefits

This would probably also benefit SSIS stuff too. Any import or transformation could probably benefit. The failing data could be outputed as a error task.

Virtualization

 
File Attachments
0 attachments
Sign in to post a comment.
Posted by donnam9999 on 5/23/2013 at 8:03 AM
This happens very frequently and the man-hours spent to fix the problems are considerable!
Posted by Ksean on 5/20/2013 at 8:44 AM
Happens with me very frequently (or may be I work with bad data). Combing through the input data to find the offending record, is always the pain especially when you have millions records incoming.
Posted by Derek Frye on 5/17/2013 at 10:29 AM
Implementing this would help save our company time when importing "less than clean" data. The time spent working around this is a factor in our choice of database systems...
Posted by Michael Decuir on 5/17/2013 at 10:01 AM
I routinely encounter errors where I have a constraint violation of some sort, but have no practical means of recreating the data set that triggered the constraint violation. As many people have noted previously, this kind of solution seems consistent with established patterns and would provide a huge amount of value when troubleshooting.
Posted by ConnieOI on 5/17/2013 at 4:47 AM
This is a VERY common problem -- I encounter it regularly. Please implement a proper solution rather than just letting the statement fail.
Posted by Mike Lewis (mzz3lh) on 5/17/2013 at 12:46 AM
I think the idea is excellent and warrants time and effort to implement. It would be a huge improvement and step forward in DML TSQL and performance-wise would allow batch processing but still isolate the error records.
Posted by Anders Borum on 5/16/2013 at 11:06 PM
I actually ran into this yesterday and it's a scenario that creeps up on me on a regular basis - as is the case with all authors posting comments here. The proposed solution is generic (common) and builds on already established conventions in the DML; it proposes a clean solution to a problem that occurs with a high frequency to many developers.

The fact that Microsoft has closed this as "not common enough" just shows how far the SQL Server (and SQL Azure) development teams have drifted from actually truly caring about developers. Although many of the development teams at Microsoft have become more transparent (and responsive to community requests) in recent years, it seems the SQL Server (and SQL Azure) development teams are still quite oblivious to constructive feedback.

I have to ask the provoking question whether SQL Server is actually a customer-driven product or more of an internal product they build to facilitate software suites sold (and developed) by other divisions. I know of numerous connect entries with bugs for e.g. SQL Azure that have been open for years (http://connect.microsoft.com/SQLServer/feedback/details/624093/sqlbulkcopy-writetoserver-fails-on-sql-azure-if-destination-table-is-temporary).

Don't ignore constructive feedback from your loyal developers; please consider more transparency and how your decisions affect the lives of thousands of developers on a daily basis.
Posted by Matt Velic on 5/16/2013 at 2:26 PM
Please do not close as "not common enough." This is an issue that I run into a couple times a month, and the proposed solution sounds fantastic and would fit into an established pattern within the product. This is a good idea.
Posted by ValeriyNY on 5/16/2013 at 12:09 PM
This would save my team hours if not days in troubleshooting and development of workarounds
Posted by Jace on 5/16/2013 at 9:44 AM
I've run into this multiple times. It needs to be fixed!
Posted by David M Maxwell on 5/16/2013 at 8:18 AM
This is absolutely a common issue. I run into it regularly when dealing with data from outside sources. (On average, 2 - 3 times per month.)
Posted by Steven Durette on 5/16/2013 at 7:50 AM
How can this not be "common" enough. Getting errors that the data was not inserted because of string length, invalid date, out of bounds (using a number to large for the type ex. 1000 into a tinyint) happens all the time to everyone. When you are dealing with 50+ columns and millions of rows it is a major pain when the error doesn't even tell you which constraint or even which field was at issue. Give me an error that at least tells me which column would drastically reduce my troubleshooting time! Give me the offending data row in the error as well would let me quickly decide if the data is bad or if a schema change is needed.
Posted by LeoPasta on 5/16/2013 at 1:40 AM
Not common enough? Is there any database developer who didn't encounter "String or binary data would be truncated" in his life?

What is the point of creating/maintaining a feedback website if the next thing you do is rushing away closing every suggestion as "won't fix" or "not common enough"?

If a suggestion is deemed reasonable (as this one surely should) but couldn't be included in this version then please, just leave it open to be considered in the next one! Or else it becomes a whack-a-mole game: You close the ticket and a variation of it will reappear within the week!
Posted by Carlos Sacristan on 5/16/2013 at 12:55 AM
Absolutely agree with my fellows. Say that this escenario is "not common enough" does not make sense!!!
Posted by Pablo Álvarez Doval on 5/16/2013 at 12:24 AM
This would be an amazing feature, and extremely useful for many of my projects.

I'd like to believe that the 'not common enough scenario' response was just a polite generic answer; I simply refuse to believe the guys at the team really think this is an uncommon scenario, becasue that would mean the really know very little about how their customers are using their product!
Posted by IanYates [Med IT] on 5/15/2013 at 4:40 PM
The various ways this would affect the query pipeline and so on would make it difficult to implement and, I imagine, may touch upon a lot of areas within SQL Server. We may have the odd bug in the first version much like how the merge statement has had its fair share of issues, but it's a REALLY good idea and would be a really compelling addition to T-SQL.

The only problem I see with it (selfishly!) is that I couldn't use it in my own apps for a while since we have users who just don't upgrade their installed copy of SQL all that often.
Posted by Shawn Quillman on 5/15/2013 at 2:28 PM
This would save incredible amounts of time. I can't in any scenario think of how this is not a common situation. I encounter this problem regularly and have at time spent hours debugging. Knowing all of the data that would cause the operation to fail would be of immense benefit.
Posted by David M B on 5/15/2013 at 1:04 PM
This feature is called DML Error Logging in Oracle and has existed since 10gR2. http://www.oracle.com/technetwork/issue-archive/2006/06-mar/o26performance-096310.html
This is an amazing feature that really would be great to have in SQL Server.
Posted by wt2 on 5/15/2013 at 12:33 PM
Not common enough?

I hope that's an attempt at humor.
Posted by John Hennesey on 5/15/2013 at 12:18 PM
This feature would be extremely useful and welcomed! The cumulative time this would have saved everyone... Please, take this one seriously. This would be a boon to data development work.
Posted by TheBeardedLlama on 5/15/2013 at 11:11 AM
not common enough?

are you serious?

you should take a leaf out of the ASP.net team and the excellent work they're doing listening to the devs
Posted by jjdDb on 5/15/2013 at 11:10 AM
This would be a very handy
thing to have. thanks.
Posted by Eric Hutton on 5/15/2013 at 10:42 AM
This is absolutely needed. I have to resort to dumping files out to text editors and manually searching for possible culprits.
Posted by Peter A. Schott on 5/15/2013 at 10:21 AM
I have to agree with others on this one. This is quite common when trying to manually handle inserts and updates. All we know is that _something_ caused a violation, but no help on what that something was. We then have to spend much more time tracking down the problem data instead of getting some helpful way to get more information about the problem. Yes, the case described shouldn't be common when working with an OLTP system with a well-designed interface and data-scrubbing, but it's quite common for ETL and ad-hoc operations.
Posted by Eric Wisdahl on 5/15/2013 at 10:07 AM
This is a great suggestion (not a bug) to handle the extremely common occurrence of data exceptions during set based manipulation and I believe should really be looked at for implementation. Perhaps, have a database / server setting for handle exceptions (or something similar) that would fail the batch as currently implemented or throw records into the virtual exception table as discussed below.

Posted by Meher on 5/15/2013 at 10:06 AM
This feature would definitely help. I request Microsoft to please consider this adding in the product.

Thanks
Meher
Posted by Argenis Fernandez on 5/15/2013 at 10:04 AM
This would be an INCREDIBLY useful feature. I see widespread applications for this. MSFT, please be a little more open-minded about this suggestion.
Posted by Jason Kyle on 5/15/2013 at 10:00 AM
This would be hugely helpful. I also work with large datasets and moving data between tables regularly, and especially with very wide tables tracking down the error can be very problematic.
Posted by Adam Machanic on 5/15/2013 at 9:22 AM
Gus: As someone who focuses on data warehousing, I see use cases for this feature on a daily basis. And I regularly deal with problems caused by bad data, which can't be caught and handled nicely in T-SQL.

If properly implemented I would use this in every single ETL process, as I'm sure would the vast majority of other data warehouse developers. The scenarios here are extremely common and very much in need of a fix.

Please reconsider.
Posted by Erland Sommarskog on 5/14/2013 at 3:13 PM
Not common enough? Come on! Sure, this is not a small order, and it will require some effort to get this right. But if you do, it will be revolutionary. Not common enough? Bah!
Posted by Microsoft on 4/29/2013 at 2:40 PM
Hello,

Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server. The reasons for closing this bug is because the scenario reported in the bug are not common enough and due to the risk of implementing a fix it unfortunately does not meet the bar for the current version of the product.

If you have more supporting information and details or feel strongly that we should reconsider, please do resubmit this item for review.

Thanks again for reporting the product issue and continued support in improving our product.
Gus Apostol, SQL Server Program Manager
Posted by Microsoft on 1/16/2013 at 1:46 PM
Thanks for great feedback, we will evaluate this request for future releases.
Posted by Simon Sabin on 12/31/2012 at 3:37 PM
I think this is a fantastic idea and would have said my team months of effort in build cross system ETL for a migration.
I've put my thoughts in a blog post http://sqlblogcasts.com/blogs/simons/archive/2012/12/31/help-to-improve-sql-batch-operations-and-etl.aspx

I discuss what errors should be logged and at what level.
Posted by Erland Sommarskog on 12/20/2012 at 3:12 PM
Bob, aren't you thinking of a different thing? Like an error_table() function to supplement the current error_message() etc which are scalar and that would only return data in a CATCH handler.

I think for a DML statement with an OUTPUT of errors (or a special clause) errors would not be raised and trigger the CATCH handler, but instead end up in the table with one row per failed row. Presumably errors would need to have a standard appearance with one xml column for the data and one for the errors - there may be multiple errors with the same row.

Yes, this is a interesting feature, if it's done right, but it requires some thinking to get it right.
Posted by Rob Volk1 on 12/20/2012 at 1:55 PM
I agree wholeheartedly with Bob's suggestion.

Adam:

Perhaps they're not "errors" if IGNORE_DUP_KEY is set, but such gaps would occur for other violations. This new table should represent everything not inserted from a single statement regardless of why, for the sake of atomicity. It's easier to filter out a particular violation like IGNORE_DUP_KEY if one doesn't care, but to ignore them entirely would make a full ETL reconciliation more difficult.

Regarding the type coercion failure you mention, an $error directive that returns XML with additional source type would be trivial to implement (and TRY_CONVERT() is available). The affected column in the errors virtual table could be set to NULL with supplementary $error info, or worst case returned as sql_variant.
Posted by Bob Beauchemin on 12/20/2012 at 1:41 PM
I'd want this for all T-SQL statements, not just DML. All database APIs except for T-SQL can harvest a collection of errors, rather than just a single error. This table should include all properties (e.g. error level) so informational errors (e.g. level 0 errors) could be filtered out. If so, they could be included.
Posted by Adam Machanic on 12/20/2012 at 1:41 PM
"The statement would still fail as it currently does but the offending row(s) would be salvaged for postmortem analysis."

I disagree here. I actually think that would make the situation confusing and difficult. What I'd much rather have happen is that the rows that don't hit a violation will go into the table, and the rows that do hit a violation will end up in the errors virtual table. This puts things in the user's hands. If you want to roll back, roll back (of course you'll need an explicit transaction for that). If you want to commit, commit. Either way, it's one step rather than two, and turns this feature into a wonderful way of setting up a "data quarantine" in T-SQL -- something that we usually need SSIS or some other external ETL tool to achieve.
Posted by Rob Volk1 on 12/20/2012 at 1:34 PM
Crap, hit submit accidentally.

It would also be nice for the errors virtual table to include identity and sequence values of failed rows, which appears would be automatic. Gaps due to IGNORE_DUP_KEY repro below:

create table #z(i int not null identity(1,1),
    b int not null primary key with (ignore_dup_key=on));
insert #z(b) values(1),(1),(2),(2),(3);
select * from #z;
drop table #z;
go

create sequence s0 as int start with 1 increment by 1 no cycle;
go
create table #z(i int not null default next value for s0,
    b int not null primary key with (ignore_dup_key=on));
insert #z(b) values(1),(1),(2),(2),(3);
select * from #z;
drop table #z;drop sequence s0;

Another suggestion: instead of adding a column to the virtual table listing the violated constraint, perhaps add an $error directive similar to the $action available with the OUTPUT clause of the MERGE statement. That would preserve the original table structure, just like inserted and deleted. And if possible, when multiple violations occur for the same row, $error would list them all, either as XML or CSV.
Posted by Adam Machanic on 12/20/2012 at 1:24 PM
Of course the last comment illustrates another point about the difficulty of implementing this. In the case I've pasted, what's the output of the virtual table? [i] can't be typed as INT if you want to see the row that you were trying to insert; doing that would create an unworkable situation.
Posted by Adam Machanic on 12/20/2012 at 1:22 PM
"It could also include a column for the constraint name that was violated."

An error may be caused by something other than a constraint violation, e.g. a data type violation:

DECLARE @t TABLE (i INT)
INSERT @t SELECT 'abc'

... so I'd rather the virtual table have a column with the actual error message. (Which would include the constraint name anyway.) And maybe a column for the error number.
Posted by Adam Machanic on 12/20/2012 at 1:17 PM
@Rob: No, please *do not* include those rows! If I've added IGNORE_DUP_KEY it's because I want to ignore them. They're NOT errors. (Okay, now we can see how difficult this feature would be to implement...)
Posted by Rob Volk1 on 12/20/2012 at 1:14 PM
Oh god yes. If you do this please be sure to include rows not inserted due to IGNORE_DUP_KEY settings on indexes and constraints.
Sign in to post a workaround.