In-Memory OLTP untrapple checkpoint file pair corruption - by NedOtter

Status : 

  Not Reproducible<br /><br />
		The product team could not reproduce this item with the description and steps provided.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


1
0
Sign in
to vote
ID 3079854 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 8/23/2016 2:45:09 PM
Access Restriction Public

Description

During a dialog I had with Microsoft about determining how checkpoint file pair corruption can be trapped for notification purposes, the following was revealed:

1. a value of 0x8800000e is written to the SQL errorlog
2. no severity is written to the SQL errorlog
3. no standardized error ID is written to the SQL errorlog
4. no text indicating corruption is written to the SQL errorlog
5. you can only find this error if you continually scan the SQL errorlog

This makes it impossible to trap checkpoint file pair corruption via an alert, and notify DBAs of this critical database issue.

It's bad enough that CHECKDB/CHECKTABLE ignores memory-optimized tables. The proposed solution to that is to backup the memory-optimized filegroup to disk = 'nul'. If there are no checksum errors, you know your database has no corruption. 

But in order to determine that there are no checksum errors, you will have to scan the SQL errlog for a '0x8800000e' after every memory-optimized filegroup backup. 

Detection of checkpoint file pair corruption can occur as a result of any operation that causes the files to be read. 

But corruption might have occured hours or days earlier, during any of these other process that also cause checksums to be recalculated:
1. checkpoint file pair merge
2. restore
3. offline/online of database
4. FCI failover

This would seem to be a somewhat radical departure from standard ways to be informed of corruption (and other sql server errors in general). 

This situation makes potential adopters of In-Memory OLTP think that it's not ready for prime time (and in this regard it's not). What could be more important than knowing your data is corruption free, and being alerted immediately if corruption occurs? 

The current status of corruption detection and notification will do little to change the minds of those hesitant to adopt In-Memory OLTP. 
Sign in to post a comment.
Posted by Microsoft on 8/24/2016 at 4:01 PM
Thanks for submitting this feedback.

An update on this issue from our side:
•    If the checkpointing process detects a checksum failure during regular processing, for example during a merge situation, we do log a sev21 error today (41355)
•    If there is a checksum failure during backup or restore, we log a sev16 error, which is the same as SQL Server does for checksum failures in mdf/ndf or ldf files
•    The team is looking at the DB startup code path to raise a sev21 error.


--
Jos de Bruijn - SQL Server PM