Invalid filegroup not raising an error in dbcc checkdb - by Stefan Oude Vrielink

Status : 

  Not Reproducible<br /><br />
		The product team could not reproduce this item with the description and steps provided.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


3
0
Sign in
to vote
ID 778910 Comments
Status Closed Workarounds
Type Bug Repros 1
Opened 2/10/2013 12:11:42 PM
Access Restriction Public

Description

After merging a lot of partitions (from 1001 to 200) and removing all unused files and filegroups, I ended up with "incosistent metadata" on a partitioned table. Luckily this was a test database, to test the partition merge.

select * from dbo.Test where CustomerID < 199:
(0 row(s) affected)

select * from dbo.Test where CustomerID > 199:
Msg 606, Level 21, State 1, Line 1
Metadata inconsistency.  Filegroup id 15 specified for table 'dbo.Test' does not exist. Run DBCC CHECKDB or CHECKCATALOG.

Every CustomerID is on it's own filegroup: 0 - 199, filegroups fg_00 - fg_199

Apart from this database inconsistency checkdb is not raising an error!

dbcc checkdb() with all_errormsgs

DBCC results for 'MDB_NET_PartitionTestSOV'.
.....
There are 0 rows in 0 pages for object "sys.syssoftobjrefs".
Unable to process index pk_Test of table Test because filegroup (FileGroup ID 15) is invalid.
.....
CHECKDB found 0 allocation errors and 0 consistency errors in database 'MDB_NET_PartitionTestSOV'.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

To my opinion an invalid filegroup must be an allocation error, or a consistency error, but is not raised as one!
Sign in to post a comment.
Posted by Marco Carozzi on 2/5/2017 at 8:59 AM
Dear MS

this bug is still around, Sql Server 64 BIT ENT 2014 SP2; here the story, with a happy ending, but some sweating.

Switch out of the oldest, last but one partition (partition AS RANGE RIGHT, with an "offline" partition as number 1) on a 1TB database, all good till when I go to remove the FG, error because of the presence of a DB snapshot, I did not know it was left there... I remove the snapshot, I remove the FG, I remove 12 files to that FG, all good.

I then find this entry in the error log, with all the prod activities in error, just on 1 table, 250GB, the most critical:
Metadata inconsistency. Filegroup id 4 specified for table 'dbo.XXX' does not exist. Run DBCC CHECKDB or CHECKCATALOG.

I find this link, https://support.microsoft.com/en-us/help/3095958/fix-metadata-inconsistency-error-after-you-switch-table-partitions-and-drop-corresponding-files-and-filegroups,
the 2 queries return the problem as expected.

SELECT * FROM sys.allocation_units AS au WHERE au.data_space_id NOT IN (SELECT data_space_id FROM sys.filegroups)

SELECT au.container_id, au.data_space_id, p.partition_number FROM sys.partitions AS p JOIN sys.allocation_units AS au ON p.partition_id = au.container_id
LEFT JOIN sys.filegroups AS fgs ON fgs.data_space_id = au.data_space_id WHERE object_id = OBJECT_ID('tab_XXX') AND fgs.data_space_id IS NULL;

The db is huge and highly critical 24/7, and also in a complex trans replication towards a DWH, the only option I had was to save it.

The result of the queries above gives me 1 as the partition, which is also empty; by reading the instructions on the link above I figure that I might try with the switch out of the part one, and see what happens.
It goes well, my main table is now back to business.

Now the second query does not return any row, but the first still does; makes sense. I then go to eliminate the copy table where I did the switch out, but the issue is there again, and it is logical.

I knew by experience that by creating a new FG it would have most probably reused the previous id, 4, so I try to add a FG, empty, with that id and try to fool the system, and indeed it did it, id 4.
I now drop the table, and it goes well. I remove the "fake" FG, all good. The first query now does not return rows either.

Problem solved, but the bug is nasty; if the cause is the snapshot then this should prevent the switch.
On the bright side, the structures of Sql Server underneath are quite robust, with a little bit of wit and experience you can always manage, so thanks for the great work you guys do!

Best regards,

Marco







Posted by Microsoft on 7/17/2013 at 9:31 AM
Hello,
Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server.
Thanks again for reporting the product issue and continued support in improving our product.

Sincerely,
Ajay.
Posted by Microsoft on 7/17/2013 at 9:30 AM
Hello,
Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server.
Thanks again for reporting the product issue and continued support in improving our product.

Sincerely,
Ajay.
Posted by Stefan Oude Vrielink on 4/17/2013 at 12:08 PM
Hello,
Sorry for the delay. DBCC CHECKDB did not find any errors to fix, although it mentioned an invalid filegroup. That's what scares me.
I have a backup file of the corrupt database. Fortunately, it is a very small database because I used it only for testing.
Despite only being a test database, I do not want to upload it to this post. Can I PM you?
Hopefully this bug can be reopend.
Tanks, Stefan.
Posted by Bryan [MSFT] on 3/14/2013 at 5:50 PM
Dear Customer,

I am a dev in SQL Server. Thanks for reporting the bug.
To investigate this issue, I need more helps from you.
Can you give me a repro?
If not, can you give me the full steps with all TSQL statements that you use for:
1) creating the partitions
2) merging partitions
When you see the data corruption, I need the DB file before you call DBCC CHECKDB.
The corrected file wouldn't help too much.

Thanks very much,
Bryan