Page 1 of 1

EXT4 Data Corruption Bug Hits Stable Linux Kernels

Posted: Mon Oct 29, 2012 11:23 pm
by melbo
http://www.phoronix.com/scan.php?page=n ... px=MTIxNDQ

snip
I think I've found the problem. I believe the commit at fault is commit 14b4ed22a6 (upstream commit eeecef0af5e):

jbd2: don't write superblock when if its empty

which first appeared in v3.6.2.

The reason why the problem happens rarely is that the effect of the buggy commit is that if the journal's starting block is zero, we fail to truncate the journal when we unmount the file system. This can happen if we mount and then unmount the file system fairly quickly, before the log has a chance to wrap. After the first time this has happened, it's not a disaster, since when we replay the journal, we'll just replay some extra transactions. But if this happens twice, the oldest valid transaction will still not have gotten updated, but some of the newer transactions from the last mount session will have gotten written by the very latest transacitons, and when we then try to do the extra transaction replays, the metadata blocks can end up getting very scrambled indeed.

*Sigh*. My apologies for not catching this when I reviewed this patch. I believe the following patch should fix the bug; once it's reviewed by other ext4 developers, I'll push this to Linus ASAP.

- Ted