Friday 2 October 2009

The looming spectre of data corruption?

A few years ago, I was watching the video of a presentation where one of the Subversion developers listed reasons that drove them to create it. One which stood out as being particularly interesting was file corruption in CVS. It went something along the lines of CVS having no verification that the data within it was correct, so it was possible for checked in files to become silently corrupt. I was aware that storage media is to some degree unreliable, but this made me more aware of one of the possible problems that might occur.

Given that corruption of files on storage media is a possibility, even if it might be a rare occurrence, is it worth doing something about? Is there a effective solution to detect that it has happened and to correct the corruption, that is practical to adopt? The case where storage media failure renders a file unreadable can be ignored, if it is considered worth insuring against, other solutions more suited to dealing with it can be used.

PAR2 data verification and repair tools, like QuickPar, might be one possible solution. A script or application that monitors storage media and generates recovery data for files, perhaps also even periodically verifying the files once in a while, actually wouldn't be that difficult to write.

QuickPar screenshot
Okay, so this isn't a problem that people generally worry about. After all, it is hard enough to find the time and energy to ensure that you are backing up all of your important data as it is. But it is something I can see being worth doing for piece of mind.