[merge] Weave checks sha1 sum while extracting

Tue Dec 20 02:29:57 GMT 2005

On Mon, 2005-12-19 at 11:46 -0600, John A Meinel wrote:
> Right now, every text inside of a weave has a sha1 sum, but we never
> check it. Attached is a patch which checks the sha1 sum every time we
> extract a text.
> The interesting thing about our current weave code, is that doing the
> sha1 only adds a negligible amount of time. Sha1 is pretty fast, much
> faster than iterating over our weave format.
> 
> I ran "bzr selftest" twice, with and without the change, and it runs in
> the same amount of time (114s,111s without vs 112s,113s with).
> I also ran bzr check twice
> 
> I really think this should be included. The overhead is low, and it
> means we actually do something about the sha1 sums, rather than only at
> "bzr check" time. (which until now they were actually ignored as well,
> only the inventory => text sha1 was ever checked).
> 
> I also ran "bzr check" and the difference was 27min without, and 28min
> with. So again, the overhead is minimal.
> 
> I've added the change to my jam-integration branch. Because the
> performance hit is negligible, and it does improve the integrity checking.

Meta-question - what does jam-integration represent? Is it 'all past the
review process' changes, or is it 'stuff you are happy with' ? (this
just lets me know whether to cherry pick as things get a second +1 from
me (yours is implied ;), or whether its always safe to yank the entire
branch.) I've started only putting stuff that is passed the +1 from two
people requirement into my integration branch -> which means that
usually my changes do not land there. Ironically enough.

> John
> =:->
> plain text document attachment (weave-checks-sha1sum.patch)

> === modified file 'bzrlib/tests/test_weave.py'
> --- bzrlib/tests/test_weave.py	
> +++ bzrlib/tests/test_weave.py	
> @@ -881,3 +898,103 @@
>          eq = self.assertEquals
>          eq(sorted(wa.iter_names()), ['v1', 'v2', 'v3', 'x1',])
>          eq(wa.get_text('x1'), 'line from x1\n')
> +
> +
> +
> +class Corruption(TestCase):

^^
two lines between top level code blocks please :).

> +
> +    def test_detection(self):
> +        """Test weaves detect corruption.

I think that docstrings on tests are actually questionable: they make it
harder to find the test with 'selftest -v' - a python path is much nicer
for finding the test. What do you think? (i.e.
    def test_corruption_detection(self):
        # check that a weave with content that results in bad
        # checksums raises WeaveInvalidChecksum on text 
        # extraction.

> @@ -548,8 +549,24 @@
>      def get_iter(self, name_or_index):
>          """Yield lines for the specified version."""
>          incls = [self.maybe_lookup(name_or_index)]
> +        if len(incls) == 1:
> +            index= incls[0]
> +            s = sha.new()

whats s? Is it the current sha ? (single letter variable names hurt my
brain).

-0 with the 's' variable.

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051220/0d562b23/attachment.pgp