Knit inventory corrupt / wrong sha1

John Arbash Meinel john at arbash-meinel.com
Thu Mar 22 13:47:12 GMT 2007


Aaron Bentley wrote:
> John Arbash Meinel wrote:
>> I've been helping Mariano debug this problem, and it boils down to
>> hardware problems.
> 
> Hmm.
> 
> I think it's worth sharing a possible cause I came up with, just so we
> can be aware of it: recursive search and replace.
> 
> If a search & replace entered .bzr and actually found something to
> replace, the results could be quite bad.  I guess this isn't a new
> problem, and many of our files are gzipped, so their contents are
> unlikely to match anyway.
> 

Well, in this particular case I did:

expected=`sha1sum inventory.knit`
for i in `seq 100`; do
  val=`sha1sum inventory.knit`;
  if [ "$val" != "$expected" ]; then
    echo "mismatch $i $val"
  fi
done

And the value would *change*. In general there were around 5 errors per
run of 100. But the specific offset ($i) would change each time. It was
pretty random.


Recursive search and replace could be problematic, but the chance of it
happening is pretty small. .knit files are gzipped, so rarely contain
texts that would be replaced. .kndx have ascii revision ids, so *might*
match if someone was doing a recursive search and replace on email
addresses.

We will catch that when we try to go from the inventory to a file text,
and we can't find the revision that it should have. Not a great time,
and certainly hard to debug.

The only thing I could really think of to catch that, would be to
include a crc32 (or the like) for each line. However, as loading .kndx
files is currently one of the slow parts for us we'd want to make it
something reasonable.

John
=:->



More information about the bazaar mailing list