Help, my disk array has one dead member

Sun Mar 26 11:40:48 UTC 2017

Karl Auer schreef op 26-03-2017 13:16:

> Calculate hashes, store them in a database, compare on read. They won't
> help you fix the corruption but they can tell you it has occurred.

Well that's real cute, and I know that's the solution, but that is an 
advanced level of maintenance compared to an ordinary filesystem. It 
requires you to either do more work, or have more solutions in place, 
and the former type of 'solution' is the type of solution that always 
ends up failing at some point.

So you really need solutions in place and I currently don't have them. 
Or rather said, a normal filesystem doesn't have them.

Then you can write scripts but yeah. Endless tasks.

I could write a bunch of scripts to calculate md5sums for every 
directory that contains audio or video material, sure. Then I would have 
to keep running those scripts every time something changes, etc. etc. 
etc. It'd boring and annoying.

And the more you turn these scripts into something that actually works, 
the more it becomes a solution.

At which point you might just as well start out by imagining that you 
need a well designed automated system for that using some md5sum 
database that you can, in the case of Linux, also operate from the 
command line.

then you can decide to either store these values in extended attributes 
(for ext4) or in some database (perhaps mysql) but you also need stuff 
that updates these values, you need a way to interface with these values 
and check these values and so on.

But when I search for "Linux md5sum database" the solution is, as usual, 
nonexisting.

There are some forensics programs that have features like these, but no 
ordinary consumer products, I believe.

But all the same, knowing the amount of corruption to expect can be an 
indication as to how urgent such a system would be for me (or anyone).

Don't forget that you also need to verify those md5sums each time you 
copy, or you won't know when the error occurred?

> Nor should you expect one. It'll happen at 12.5TB! (<--joke) And
> anyway, Murphy's first corollary says it will happen at the worst
> possible time.

Aye but after I have read these 10GB 4000 times at least one error 
should be in it right.

If not, this 40TB of reads should reasonably indicate that 12.5TB is a 
bad number.

Others have probably already done this for me. But writing a script and 
firing it off is easier than doing research on the web :p.

But I don't know how much random data is in that space. It was used as 
snapshot space. There ought to be some data in it, but I'll just 
randomize it just in case.