brisbane:CHKMap.iteritems() tweaks

Robert Collins robert.collins at canonical.com
Tue Mar 24 22:58:09 GMT 2009


On Tue, 2009-03-24 at 17:37 -0500, John Arbash Meinel wrote:
> 
> 
> git doesn't have to zlib.decompress() all of the texts that aren't
> referenced in its delta chain. Having a text in the middle of a group
> causes us to have to at least decompress all the previous bytes,
> whether
> they are specifically referenced or not.

Thats equivalent to having the prior texts in the delta chain; we
decompress the chain, but we don't process each text.

> When I was exploring breaking at file_id boundaries (always, or at
> least
> more often) it caused the total compressed size to go up by a sizeable
> amount. (I assume from losing cross-file compression.) Though perhaps
> it
> was something like group overhead from all of the empty directory
> texts
> not being shared in a group, or something weird like that.

We may need to investigate further :).

> So for "decompressing the content" speed, we are talking 2.5s => 1.1s.
> This is compared with the 5s we spend in "get_build_details" pulling
> information out of the .tix.
> 
> We could shrink the standard group size, we could try shrinking the
> multi-file-id group size a bit more (at 1MiB, the total compressed
> size
> was the same, at <512KiB the total size started increasing). I would
> assume that tuning these is mostly data dependent.

So would I.

> It might be an answer for getting the size back, without paying the
> lzma
> overhead for all access.

True, OTOH 'log' and 'annotate' access all the old history, so we will
still commonly pay for it.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090325/b13f3f4f/attachment.pgp 


More information about the bazaar mailing list