Home Dashboard Directory Help
Search

SQL 2012 SSISDB Cleanup causes User Packages to fail by SAinCA


Status: 

Active


31
0
Sign in
to vote
Type: Bug
ID: 749391
Opened: 6/15/2012 2:30:17 PM
Access Restriction: Public
3
Workaround(s)
view
11
User(s) can reproduce this bug

Description

The SSISDB's Catalog Properties included the data retention set to the default 365 days. There is only ONE Folder in this Catalog. Changing the retention to 7 days and running the "SSISDB Server Maintenance Job" causes the only package scheduled to run every 15 minutes, that typically runs in 90 seconds, to run for 19 minutes and fail with no explanation at all.

I can see the query below executing for over 2 hours (17GB of SSISDB Data). UPDATE STATISTICS is also running on a plethora of threads under the Maintenance Plan for the same DB:
DELETE TOP (@delete_batch_size)
FROM [internal].[operations]
OUTPUT DELETED.operation_id, DELETED.operation_type INTO @deleted_ops
WHERE ( [end_time] <= @temp_date
        OR ([end_time] IS NULL AND [status] = 1 AND [created_time] <= @temp_date ))

I had hoped that the "Delete Batch Size" might be configurable, but no signs of it on the page dedicated to SSIDB at http://technet.microsoft.com/en-us/library/hh479588.aspx.

In running the User Package for over two months non-stop, there have been ZERO failures. The "SSISDB Server Maintenance Job" is clearly impeding normal operations. Given that this User process MUST run every 15 minutes, there is NO maintenance window in which the Maintenance Job could be scheduled so as to obviate this interfering behavior.

Please fix the delete process so it doesn't render the SSISDB inaccessible for NORMAL Production access.

(Stand-alone instance under VM-Ware)
Details
Sign in to post a comment.
Posted by Thomas W Marshall on 3/17/2014 at 6:07 AM
http://support.microsoft.com/kb/2829948

Fixes for this issues are presented in SQL 2012 SP1 CU4 and SQL 2012 CU7.
Posted by Thomas W Marshall on 3/12/2014 at 7:33 AM
My colleague David just pointed out to me that the @deleted_ops variable does not get truncated with each iteration of the loop. This in turn leads to it acculumulating all of the deleted ids and with each iteration forcing scans, seeks, and lookups in support of the implied join the cascade delete forces.

This process is consistently blocking all package operations from completing. Changing the retries on the agent job that calls it to zero and killing it when it is found at the head of a blocking chain helps to alleviate the subsequent issues, but that only excaberates the fault in terms of the history cleanup.
Posted by Mark Tassin on 1/15/2013 at 1:01 PM
There are several problems here... No indexes on most of the involved tables causes table scans for one. The internal.cleanup_server_retention_window stored proc doesn't specify a hint to control locking so it takes table locks when it runs delete.

There's no way to control the size of the batches except to actually modify the stored proc.
Posted by Microsoft on 7/3/2012 at 1:53 AM
Hi SAinCA

Thank you for your feedback. We will research the issue and respond as soon as possible.
Posted by Ron Van Zanten on 6/27/2012 at 1:49 PM
This is an issue for us also. Is there anybody that know a way around this?
Posted by groenhar on 6/27/2012 at 1:36 PM
We are having the same issue within our production environment. What is the work around to avoid the interference with logging of the packages running in production constantly? I agree the behavior is interfering with production processes. I am not sure how to workaround this issue. The FK's to the operation id is set to cascade delete all the records in the child tables. This appears be blocking the inserts of the production packages doing the normal logging which is causing time outs and the packages to fail during the maintenance job running. Anyone have a workaround?
Posted by SAinCA on 6/15/2012 at 2:43 PM
ADDITIONAL INFORMATION:
==================

Maintenance Plan Job cancelled to reduce IO contention.

The following command is waiting on an LCK_M_IX Wait Type on the SSISDB on three threads for Application "SSIS ISServerExec" - each has been waiting 20 minutes, then disappears coincident with the latest User Package failure:

SSISDB.internal.append_event_message;1

The DELETE task details are : EXEC [internal].[cleanup_server_retention_window]
Sign in to post a workaround.
Posted by mikegoodtampa on 4/16/2013 at 6:25 PM
Even with extra indexes to support the cascade deletes, this routine blocked execution of pkgs for me. Workaround was to create modified copy of original MS cleanup_server_retention_window proc with below mods. I think MS should consider implementing something like this:

add near top:
    create table #keys(operation_id bigint)

replace original DELETE from internal.operations with this:
    --get keys of next batch of internal.operations rows to delete
    delete #keys
    insert #keys(operation_id)
    select top (@delete_batch_size) operation_id
     from internal.operations
     where (end_time <= @temp_date)
        or ((end_time IS NULL) AND (status = 1) AND (created_time <= @temp_date))

    --delete that batch; this is same as original DELETE, just changed the WHERE
    DELETE
        FROM [internal].[operations]
        OUTPUT DELETED.operation_id, DELETED.operation_type INTO @deleted_ops
        WHERE operation_id in (select operation_id from #keys)
Posted by Brandon.Tucker on 2/19/2013 at 1:31 PM
I had to address this issue in one of our environments today. My solution was to use a stored procedure that allows you to start a SQL Agent job and doesn't return until the SQL Agent job is complete. We've had this code in our enterprise for a while, so I can't pass along a URL unfortunately. Essentially we're cleaning up the database, one day at a time, until we reach our target end value. After reach day, we wait 30 seconds to allow any blocks to clear and allow running packages to insert data. We saw roughly 5 minutes per day, with a total run time of nearly 24 hours to clean up.

---ANY AREA WITH '!!!' MUST BE ADDRESSED FOR THE SCRIPT TO RUN PROPERLY.
DECLARE @Days INT = 365
DECLARE @Target INT = 30
DECLARE @Msg varchar(1000) = 'SSIS cleanup done on ' + @@SERVERNAME

WHILE (@Days > @Target)
BEGIN
    PRINT 'Setting Retention to ' + CONVERT(VARCHAR, @Days) + ' days.'
    EXEC [SSISDB].[catalog].[configure_catalog] @property_name = N'RETENTION_WINDOW', @property_value = @Days
    
    PRINT 'Start: ' + CONVERT(VARCHAR, SYSDATETIME())
    --!!!Call proc to run SQL Agent job 'SSIS Server Maintenance Job'
    PRINT 'End: ' + CONVERT(VARCHAR, SYSDATETIME())

    PRINT ''
    SET @Days = @Days - 1
    WAITFOR DELAY '00:00:30'
END

--Comment the line below in to send an email when finished
EXEC msdb.dbo.sp_send_dbmail @profile_name='!!!INSERT YOUR PROFILE NAME HERE!!!', @Subject=@Msg, @Recipients='!!!INSERT YOUR EMAIL ADDRESS HERE!!!'
Posted by richardps on 12/5/2012 at 6:17 AM
The problem is exacerbated by the Foreign Key CASCADE deletes. The underlying issue is a lack of indexes. Try this:

USE [SSISDB]
GO
CREATE NONCLUSTERED INDEX [NCIX_operation_id]
ON [internal].[event_messages] ([operation_id]);
GO
CREATE NONCLUSTERED INDEX [NCIX_operation_id]
ON [internal].[operation_messages] ([operation_id]);

Let me know...
GO