[merge] knit index robustness/corruption?

John Arbash Meinel john at arbash-meinel.com
Mon Jun 25 17:25:25 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Martin Pool wrote:
>> I have, for some reason, a knit file with an incorrect index record.
>> It does not look like the kind of problem that originated in Bazaar,
>> so I suspect perhaps a network or even hardware error.  I have had a
>> couple of hard crashes on this  machine is recent weeks.
> 
>> The line in question, 4751, is in the middle of the file, pretty old,
>> and in between other correct records.
> 
> It took me a minute to realize that record 4751 has "471v" where an int
> is expected.
> 
> Since this is an unusual condition, I wonder whether this should be done
> by a "repair" mode.  This fix does make the knit readable, but doesn't
> doesn't provide a way to fix the index permanently, e.g. by regenerating
> the index.

Except, we cannot regenerate a .kndx file from the .knit. The ancestry stored
in the index is not stored in the .knit file. (An oversight from the knit
design, IMO, but still a problem).

I think it is reasonable to include Martin's fix with a nice big warning about
problems on that line. I'm not 100% sure on it, though. Because we've read a
complete line, which means that the line is genuinely corrupted. Not just
incomplete.


....

> I prefer to wrap exception handling as tightly as possible.  So
> try:
>     parent_id = history[int(value)]
> 
> or even
> 
> try:
>     parent_index = int(value)
> except:
>     # warning, etc.
> else:
>     parent_id = history[parent_index]
> 
> would suit me better.
> 
> Aaron

I agree with Aaron that tighter bounds for exception catching is nicer. We need
to be cautious, though, as this is one of the primary inner loops that we spend
time in. So adding too much extra stuff here could measurably impact performance.

So overall, I think:

1) Raising a nicely formatted error rather than ValueError, is better than just
issuing a warning and skipping the line. Because we know that the line is
genuinely corrupted, not just a "I started writing but my network timed out, so
I pushed again and it wrote a new line".

2) Tighter exception handling as long as it doesn't dramatically impact
performance. I *think* try/except should be cheap when there isn't an exception
being raised.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGf+x1JdeBCYSNAAMRAhOdAKCHn6x6AYqvaei5MCt3h862SRkyOACg13L6
Avz2JpQPzfrod4cHanMWSuI=
=7wy+
-----END PGP SIGNATURE-----



More information about the bazaar mailing list