[MERGE] tuned_gzip.bytes_to_gzip

Ian Clatworthy ian.clatworthy at internode.on.net
Mon Sep 17 08:34:28 BST 2007


Robert Collins wrote:
> This is another tweak-level optimisation. This time no tests break; I've
> not written specific tests for the helper, but I am willing to be told
> to go back and do that (they would consist of 'make a gzip by hand, use
> the helper, compare'). I didn't write specific tests because we don't
> have tests for tuned_gzip - we exercise the gzip layer heavily in our
> knit tests, and we do test creating gzips via the regular api and
> reading them via the normal knit code - if the new helper is broken,
> test_knit breaks.

Fair enough. As you would undoubtedly remind me though, it is good
policy to have at least a sanity test for public APIs. If client code
starts using that API - it's public after all, then that sanity test is
where the deprecation check can be grafted on.

A simple "compress using the new method, decompress and compare" would
be nice in this regard. I'm not going to insist on it but I'm guessing
from the long explanation above that you're feeling (a little) guilty
for not adding something like that. :-)

bb: approve

There are a few spots other than the one you changed that use GzipFile
in wb mode: one in multiparent.py and one in store/text.py. If you
haven't already, please have a think about whether these should be
changed as well. It doesn't need to be done in this patch though a TODO
comment added above the two lines in question wouldn't hurt (so it
doesn't fall through the cracks completely). Up to you.

Some other comments below.

> +        bytes = (''.join(chain(
>              ["version %s %d %s\n" % (version_id,
>                                       len(lines),
>                                       digest)],
>              lines,
> -            ["end %s\n" % version_id]))
> -        data_file.close()
> -        length= sio.tell()
> -
> -        sio.seek(0)
> -        return length, sio
> +            ["end %s\n" % version_id])))

Is the enclosing () needed around bytes?

> +def bytes_to_gzip(bytes, factory=zlib.compressobj,
> +    level=zlib.Z_DEFAULT_COMPRESSION, method=zlib.DEFLATED,
> +    width=-zlib.MAX_WBITS, mem=zlib.DEF_MEM_LEVEL,
> +    crc32=zlib.crc32):

A comment or two separate to your patch ...

1. John might have some figures on various compression levels, their
performance and space tradeoffs. IIUIC, Z_DEFAULT_COMPRESSION is -1
which maps to 6 on a scale of 0-9, 9 being the slowest and most
compression. Less compression is necessarily faster (overall) of course.

2. crc=zlib.alder32 might be slightly quicker as well. Changing this
looks more complex that just changing the compression level though
because I'm guessing that's a different file format (unless a flag is
used to indicate the type of crc?). Our decompress code certainly looks
hard-coded to crc32 (and fair enough given it's goal of being optimised).

Ian C.



More information about the bazaar mailing list