Friday, December 18, 2020

Fixing bad T10-PI checksums

Background (in case you care)

Let's say you have some kind of hypothetical data disaster where you lose all redundancy in a RAID-6 pool through no fault or design decision of your own, maybe because of a firmware bug. Then let's say a hard drive physically bites the dust at the worst possible time.

Then let's imagine that you send the drive off to a data recovery firm who is able to recover 100% of the data (awesome!!!), but they don't know about T10-PI checksums and thus don't copy them onto the clone. Let's assume that due to various circumstances this was your one shot at it and they can't just reclone with T10-PI for some reason.

So then pretend that you get your 100% successful clone back, only to find out that your disk is unreadable in the array because the disk, which was correctly formatted with T10-PI Type 2 checksums, does not contain the correct checksums for each sector. Every read of every sector on the disk will fail.

So now your data is sitting there on the drive, just waiting for you to pull it off. Simple, right? Just find somewhere else you can read the data off by disabling the checksum verification. Well, let's say that for some reason your "RAID-6" pool is actually proprietary RAID-6 from the vendor, so you're stuck using their array to read the data and you can't disable T10-PI. Uh oh.