Solving the commit-editor-locks-stuff-up problem.

Vincent Ladeuil v.ladeuil+lp at free.fr
Sat Mar 21 13:27:28 GMT 2009


>>>>> "robert" == Robert Collins <robert.collins at canonical.com> writes:

    robert> On Sat, 2009-03-21 at 11:41 +0100, Vincent Ladeuil wrote:
    >> >>>>> "robert" == Robert Collins <robert.collins at canonical.com> writes:
    >> 
    >> 
    robert> Certainly write operations currently have to write
    robert> the entire thing always because its checksum needs
    robert> updating.
    >> 
    >> Two cents here:
    >> 
    >> 1) There is a potential bug with the checksum if the working tree
    >> is shared (mounted file system for example) between a 32bits host
    >> and a 64bits host. The 32 bits host will write an signed 32bits
    >> checksum while the 64bits host will write a unsigned one.

    robert> I think you're wrong; we had that bug during development; the fixed
    robert> checksum validation tests ensure we get the same checksum on all
    robert> platforms.

32bits, somewhere with lots of branches:
find . -type -name dirstate -print | xargs head -n2 | grep crc32 | wc -l
211
find . -type -name dirstate -print | xargs head -n2 | grep crc32.*- | wc -l
87

Not 50% but close enough.

64bits, somewhere with lots of branches even if a bit less:
find . -type -name dirstate -print | xargs head -n2 | grep crc32.*- | wc -l
51
find . -type -name dirstate -print | xargs head -n2 | grep crc32.*- | wc -l
0

Very far from 50%.

Not a proof, but I'd be very surprised if statistics just play
tricks with me here.

Regression ? I didn't investigate very deeply but it seems to me
the checksum stopped to be used at some point...

    >> 2) We don't really care so far because nobody use that checksum.

    robert> Really?

I couldn't find any, pointers welcome.

This was a concern in bbc since we use that in search key
functions that was addressed with:

bzrlib/_chk_map_py.py:

def _crc32(bit):
    # Depending on python version and platform, zlib.crc32 will return either a
    # signed (<= 2.5 >= 3.0) or an unsigned (2.5, 2.6).
    # http://docs.python.org/library/zlib.html recommends using a mask to force
    # an unsigned value to ensure the same numeric value (unsigned) is obtained
    # across all python versions and platforms.
    # Note: However, on 32-bit platforms this causes an upcast to PyLong, which
    #       are generally slower than PyInts. However, if performance becomes
    #       critical, we should probably write the whole thing as an extension
    #       anyway.
    #       Though we really don't need that 32nd bit of accuracy. (even 2**24
    #       is probably enough node fan out for realistic trees.)
    return zlib.crc32(bit)&0xFFFFFFFF

So it may be that you tested it with a combination were the
returned value was unsigned.

Since I couldn't find any user of the dirstate checksum, I
thought it wasn't worth fixing in bzr, if I'm wrong, then, this
can be pretty serious.

  Vincent



More information about the bazaar mailing list