bazaar corruption causing crash

Vincent Ladeuil v.ladeuil+lp at free.fr
Fri Jan 7 08:10:01 UTC 2011


>>>>> Henry Gomersall <heng at cantab.net> writes:

    > On Thu, 2011-01-06 at 17:35 +0100, Vincent Ladeuil wrote:
    >> >>>>> Henry Gomersall <heng at cantab.net> writes:
    >> 
    >> 
    >> > I submitted the problem as a bug report (697815), but its seriously
    >> > stalling me so I am keen to get feedback on how I can get my repository
    >> > up and running again.
    >> 
    >> You made the bug private and I can't access it.

    > Its public now.

Thanks, see there for a script and some explanations about how to repair
your repo and branch.

<snip/>

    > I have had a fairly flaky system lately. I assumed it was the kernel but
    > its possible that my (less than a year old) hard drive is dying. Running
    > a full SMART self-test now (the short one did not fail).

See below, your HD may not dying after all (which is good news).

    >> > To fail like this is pretty serious.
    >> 
    >> Indeed. What OS/file system are you using, did you experience crashes
    >> lately ?
    >> 

    > Its Ubuntu 10.10 on ext4. As I said, yes, some crashes recently. No idea
    > what is causing them though. Could it have been a crash of any kind that
    > buggers up a commit?

Hmm, ext4....

So the short story here is that ext4 lies to us. It pretends that a set
a modification has been committed to disk when this is only partially
true and can be corrupted by a crash.

bzr does:
1 - create the pack file (content and name),
2 - use the pack file name into 'pack-names',
3 - write the 'pack-names' file to disk (with a rename dance)

After the crash the file system says: "Oh, I've indeed done 2 and 3,
forgive me for forgetting about 1 even if you asked for that first".

Why, thanks all the same ext4 !

    >> > How much of the version data is lost?
    >> 
    >> The content of the pack file itself, so depending on its size one or
    >> several revisions.
    >> 
    >> > Is there any way to revert it?

See bug report, in short, you've lost only the last committed revision,
so hopefully you'll be able to recover.

    >> 
    >> If the revisions have been pushed elsewhere (or pulled from elsewhere),
    >> yes.
    >> 
    >> First you need to get rid of the broken pack file and repair the
    >> repository. There are various ways to do it but we'll need more details
    >> first like:
    >> - ls -lR .bzr
    >> - bzr dump-btree ./bzr/repository/pack-names

    > Nope, that was the only place. The version information is actually *not
    > that* important (its pretty early stage).

Right, I wasn't sure from your first description that it was an empty
pack file. We had a few reports about that and I encounter the problem
myself with virtual machines experimenting very rude shutdowns. So I've
used the proposed scripts 10 or 15 times already. In my case the
branches involved were only mirrors so a simple 'bzr pull --overwrite'
was enough to restore them (the repo been fixed by the script).

<snip/>
    > .bzr/repository/packs:
    > total 124
    > -rw-r--r-- 1 whg21 whg21  4723 2010-10-20 16:55 42400c555867e468de8490632553f9cb.pack
    > -rw-r--r-- 1 whg21 whg21     0 2010-12-04 19:35 4d46aaa30ce4a05d9a71342fb858bfe7.pack

Indeed, an empty pack file.... <shudder/> ext4... 

It would be interesting, if you keep encountering crashes (which I don't
wish you :-}), to record as precisely as you can the exact times at
which the crashes occur and compare them with the commit times. I've
heard numbers as high as 30 minutes for ext4 *really* committing to disk
(which sounds absurdly high and really scary for a file system with a
journal feature...).

       Vincent



More information about the bazaar mailing list