Provide a mechanism for minimally logged deletes that won't bloat the transaction log. - by John Paul Cook

Status : 

 


130
14
Sign in
to vote
ID 509341 Comments
Status Active Workarounds
Type Suggestion Repros 1
Opened 11/9/2009 11:24:09 AM
Access Restriction Public

Description

Full logging of deletes isn't always necessary and can even be highly undesirable. Although TRUNCATE has the desired performance and effect on the transaction log, it can't be used with a where clause. Nor can it be used when foreign keys exist.

People are using various techniques as workarounds:

1. Delete in batches with transaction log backups to limit transaction log bloat.

2. Save the small amount of data to be kept into a temporary table. Drop the foreign keys. Truncate the table. Load the data in the temporary table. Restore the foreign keys. Drop the temporary table.

It's a lot of trouble. So what if the minimally logged delete fails and there's no way to roll it back? I don't care. I want the data gone forever. If the delete fails, I'll just reissue it. I don't need logging or recovery for that.

Testing ETL is one use case. Iterative testing of ETL sometimes requires bulk deletion. Try the ETL, oops it didn't work properly, delete the data, change the ETL and try again. But if a bulk delete takes hours to complete or stops the server because the transaction log runs out of space, it makes everything painful.
Sign in to post a comment.
Posted by John Paul Cook on 6/24/2015 at 1:37 PM
In SQL Server 2014 Enterprise Edition you can create in-memory tables with a durability of SCHEMA_ONLY. This can solve some of the use cases that have been discussed.
Posted by Wes at SRP on 6/24/2015 at 1:11 PM
I voted for this specifically because TRUNCATE requires DDL admin. I need a way to quickly delete data from a table without requiring this permission. It is frustrating that I have to grant DDLAdmin so they can empty a table without blowing out the log or slowing down the server.

I'm in a data warehouse environment, using SIMPLE logging mode. I understand that truncate resets the identity key, but we have plenty to spare and don't need to reset it after every delete.

Please make a minimally logged Delete available. Maybe this could be a new query hint... DELETE WITH (NOLOG) etc...
Posted by Sunil [MSFT] on 2/3/2015 at 9:34 AM
Revisiting it in the contect of clustered columnstore index. I consider large deletes is more common for data warehouse scenarios rather than OLTP. Since the direction of SQL team is to recommend CCI for DW scenario. if we address large deletes in CCI with minimal logging, will it be acceptable?

thanks
Sunil
Posted by Sunil [MSFT] on 2/3/2015 at 9:33 AM
Revisiting it in the contect of clustered columnstore index. I consider large deletes is more common for data warehouse scenarios rather than OLTP. Since the direction of SQL team is to recommend CCI for DW scenario. if we address large deletes in CCI with minimal logging, will it be acceptable?

thanks
Sunil
Posted by RobNicholson, MCSM on 5/10/2012 at 12:18 AM
This would be useful but I can only envision it working under Simple or Bulk Recovery mode.
Posted by Erland Sommarskog on 12/10/2011 at 11:52 AM
I have thought a little more about this, but I have not changed my vote - the proposal as written asks for something which is bad.

But a minimally logged - but still recoverable and rollbackable DELETE - could work in this way: If more than X of the rows (or more than X bytes) on an extended are deleted, copy the remaining rows to a new extent and log only the extent deallocation. The copying to a new extent would be a minimally logged INSERT. (The extent deallocation is a fully logged operation.) This would require TABLOCK on the table and that the database is in simple or bulk_logged recovery. (In full recovery, DELETE could still log only the extent deallocation, if all rows on the extent are gone.)

I guess one problem here is that SQL Server does not first identify the rows to delete, and first when this is done starts to delete them. It seems that this would be more or less required to achieve this.

There is no argument that this is a big painpoint for users.

Posted by phe on 3/28/2011 at 6:16 AM
what about supporting bulk log for DELETE? i.e. if all rows in a page is deleted by the DELETE statement, log the page; Otherwise log the records deleted.
Posted by Sunil [MSFT] on 12/15/2009 at 6:26 PM
Victor, thanks for your feedback.
Posted by SQL Ranger on 12/8/2009 at 8:43 PM
Erland, you would not necessarily have to do it at the extent level.

JPC has outlined one use case. But there are plenty of other use cases. In my case I want to selectively delete a large volume of records from a production system that is 24x7. The problem with this large table is that it is also very wide rows due to a VARCHAR(MAX) field that typically has 4000-6000 bytes worth of information.

Try deleting 100,000,000(s) records on such a table. You'll encounter a number of problems. Firstly, the transaction log will blow out to 100s of GB as we need to fully log each record that you are deleting. Secondly, performance will suffer within that database as you are so heavily using the transaction log. Thirdly, you obviously impact INSERT (and UPDATE) activity on that table. Simply due to it being a large and wide table. You have the pay the piper! And the piper in this case is the sequential transaction log.

So I would very much like to see a minimally logged operation such as:

DELETE MyTable WHERE Status = 3 (WITH NO_LOG) -- MINIMALLY_LOGGED might be more accurate

In terms of how to achieve this "business requirement" It's Microsoft job to work it out Erland/Jermiah/Adam. Thats why we pay the $$$$$ for the Enterprise Edition and this is a must in an enterprise environment where you are by nature working with larger volumes of data.

So I think there no point in saying that "you can't do this, or I won't vote for this BECAUSE you would corrupt your database if you were deallocating extents while your database crashed, or because of problems with IAM and index/locking issues". An EXTREMELY valid business requirement is there!

Having said all of that, my issue is that we log the record that needs to be deleted.

Why not just record the FileID/PageID/Record Identifier for the rows that we need to delete, instead of the entire record?

Then there's the question of what to do in the case of a "failure". Do we roll back? Or do we commit (roll forward) what has been done so far and make the user re-run the DELETE operation.

Cheers,

Victor
Posted by Sunil [MSFT] on 11/24/2009 at 10:31 AM
Thanks for the spirited discussion on this topic. As indicated by Andy Irving, using partitioning to work around this issues is a work around and it will be great if it can be avoided. Erland proposed something in this discussion about minimal logging if all the rows on a page are deleted and I believe this can done. In fact this is a similar kind of optimization that is available using TF-610 for the insert into a table.

This has been a popular ask by customers and a high vote count on this connect item re-confirms that.

Thanks
Sunil

Posted by Sunil [MSFT] on 11/24/2009 at 10:22 AM
Thanks for the spirited discussion on this thread. I agree that providing an DELETE operation will be very useful for cases where trunctcate table can't be used. As Andy Irving mentioned, partitioning to get around fully logged delete is just what it is, a workaround. The idea of minimally logged delete along the lines suggested Erland seems very workable. In fact this is what we do for minimal logging inserts with TF-610 where only the the rows inserted into 'newly' allocated pages are minimally logged. Any comments on this proposal will be appreciated.

We have heard this feedback from many customers and infact a vote count of 46 on this connect item re-confirms that.

Thanks
Sunil

Posted by John Paul Cook on 11/12/2009 at 2:00 PM
I can add reasons to not provide this feature or similar functionality. But the point is that slow deletes are a major pain point for many people. Not everybody needs full logging of bulk deletes. Even if the product team hasn't solved this problem yet, I'd like to see them really try.
Posted by Adam Machanic on 11/12/2009 at 12:36 PM
There should not be features in the product that are not safe for use in production. That's insanity. Should each instance have a "this is production" flag that when set to false will enable you to play additional games?

As I mentioned on a blog post on this topic:

It seems to me that there are a number of reasons that you can't do this today, including: RI needs to be maintained, which requires a check before deleting each row on a page. Index maintenance may have to be done to keep the linked list up to date; you can't just deallocate the page and update the IAM. And then there is the issue of having to find and lock the pages that need to be deallocated, which could lead to concurrency issues (of course, that could be avoided if a TABLOCK hint is required). It seems like a great idea, but unless the SQL Server team has solved these problems it would be impossible to implement.
Posted by Bill_Braun on 11/12/2009 at 9:46 AM
If you want ref integrity, can we not just add a 'WITH NO_LOGGING' option so that we can run a normal Delete on a table and let cascading deletes run with it. This gives us the where clause capability and would make ETL clean up even easier.
Posted by Andy Irving on 11/12/2009 at 2:08 AM
the way i see it, this could be a useful alternative to partitioning old data then dropping the partition; if instead i can delete what would be the partition's content with minimal logging, then i don't need to implement partitioning just for this (what with it being EE and above)
Posted by John Paul Cook on 11/11/2009 at 4:23 PM
Jeremiah, developers and testers are tired of scripting out workarounds. Fully logged deletes are a pain point for many. We're not talking about use in production. This isn't for everybody. Developers are underrepresented in SQL Connect. If more knew about this proposal, more would be in favor of it.
Posted by Jeremiah Peschka on 11/11/2009 at 3:41 PM
I have to agree with Erland on this one. What you are proposing sounds incredibly dangerous to me. Referential integrity exists for a reason. If you really want to clean out the database, it's fairly trivial to script all foreign keys with a drop and re-create.

This seems to me like a workaround to a problem that already has multiple, safer, solutions.
Posted by Erland Sommarskog on 11/11/2009 at 1:58 PM
I would suggest that the proposal that I have in https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=312074 is a better solution. That is, permit TRUNCATE TABLE for tables
that are referenced by FKs, if the referencing table empty.

I doubt that Microsoft would ever implement a feature that could leave the database corrupt. What they could
to is a version of DELETE that determines whether an entire extent can be deallocated, and in such case
only logs the extent deallocation. But rows that are on extents on which at least one row is to remain, would
have to be logged one by one.
Posted by David F. Anthony on 11/10/2009 at 5:49 AM
Having a quick unlogged delete would help with unit testing as well.
Posted by Seth Lynch on 11/9/2009 at 12:33 PM
I have a few cases where all the data from a database is deleted and the tables are repopulated from an import.
For the stand alone tables a Truncate does a lovely job, For tables with foreign keys Delete is used - I would much rather have a quick non-logged delete.