split-inventory auto-pack doesn't get rid of '.cix'

John Arbash Meinel john at arbash-meinel.com
Wed Dec 3 16:14:27 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> John Arbash Meinel wrote:
>> I just looked at my conversion's .bzr/repository/indices directory, and
>> I found 231 .cix files (after issuing 'bzr pack', so there is only 1
>> actual pack file.)
> 
>> I assume this isn't very hard to fix, but I did find it a bit surprising.
> 
>> John
>> =:->
> 
> I should also mention that doing "bzr pack" clears out approx 25MB of
> redundant data. This, presumably, are chk nodes that 'collide' and give
> exactly the same sha1 sum as a node derived from some other method.
> (Perhaps this is because of merge nodes?)
> 
> But it also means that a conversion "wastes" about 1/4th of all the
> inventory space. (There is 89MB of referenced inventory, and another
> 25MB just lying around, and that is presumably *after* auto-packs have
> been clearing out stuff.)
> 
> Something weird, though, I have 2 60-70MB pack files, and then a bunch
> of small ones, but those only add up to around 20MB.
> 
> Still, it would seem that the conversion process bloats the in-progress
> pack files far more than I think we expected.
> 
> John
> =:->

To follow up on this a bit, I started a MySQL conversion and let it run
overnight. When I checked on it this morning it had converted 39k
revisions (out of something like 65k) and was consuming 900MB of RAM.

More critically, I'm seeing a *lot* of waste in the pack files. I went
ahead and hacked in some code to give the size of the packs being packed
versus the size of the final pack. And I'm attaching a trimmed log file.

The key parts are lines like this:
35166.635  Auto-packing ... which has 20 pack files, containing 38000
35184.372  Auto-packing ... completed 101.269MB => 14.456MB
36581.414  Auto-packing ... which has 21 pack files, containing 39000
36601.117  Auto-packing ... completed 117.795MB => 17.208MB

That means that in those 10 pack files that we decided we needed to
recompact, we had 86% waste.

I don't know exactly why yet. Whether it was because we weren't applying
deltas properly, so it was causing us to rebuild the entire inventory
each time (which obviously has mostly overlap with the previous
inventories), or whether something else weird was happening.

Here are the stats so far:
Commits: 39800
                      Raw    %    Compressed    %  Objects
Revisions:      78886 KiB   0%     31413 KiB   5%    39800
Inventories:   582649 KiB   4%    330680 KiB  60%   751678
Texts:       12179723 KiB  94%    185072 KiB  33%   153830
Signatures:         0 KiB   0%         0 KiB   0%        0
Total:       12841259 KiB 100%    547166 KiB 100%   945308

Extra Info:           count    total  avg stddev  min  max
internal node refs   515117  4841005    9    8.5    2   29
internal p_id refs    27849   194136    6    8.1    2   27
inv depth            161855  1042795    6    2.6    1   17
leaf node items      161855   972861    6    3.9    1   18
leaf p_id items        7057    68920    9    9.5    1   38
p_id depth             7057    70045    9    5.1    1   19

These trees are significantly deeper, with an average of 6 levels deep
to get to a leaf node, and 9 levels for the parent_id,basename =>
file_id map. An average of 3.8 text changes per commit, and 18.8
inventory changes per commit.

Surprisingly the p_id map only has 7k leaf nodes and 28k internal nodes.
It seems fairly compact given that it on-average 9-levels deep and at
its deepest goes all the way to 19 levels.

I do think that hash keys will be interesting here, though I think we
need a better solution for the 86% waste before we do more. It just is a
lot of work that we are throwing away when we are done.

The memory consumption is also a bit surprising, I'm guessing we have a
leaky cache, but I'm not positive. I know when I hit ^C it was inside
the LRUCache code, but I didn't do any sort of rigorous exploration.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk2sGIACgkQJdeBCYSNAAN1QgCaA9uRIXbRbh6eGqybvfVdvvLw
/04AnA7Drsy1Tq6r0FPWFSMVncf03eFI
=f69Y
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bzr-mysql.log
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20081203/57aae5a9/attachment.diff 


More information about the bazaar mailing list