split-inventory auto-pack doesn't get rid of '.cix'
John Arbash Meinel
john at arbash-meinel.com
Wed Dec 3 16:14:27 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> John Arbash Meinel wrote:
>> I just looked at my conversion's .bzr/repository/indices directory, and
>> I found 231 .cix files (after issuing 'bzr pack', so there is only 1
>> actual pack file.)
>
>> I assume this isn't very hard to fix, but I did find it a bit surprising.
>
>> John
>> =:->
>
> I should also mention that doing "bzr pack" clears out approx 25MB of
> redundant data. This, presumably, are chk nodes that 'collide' and give
> exactly the same sha1 sum as a node derived from some other method.
> (Perhaps this is because of merge nodes?)
>
> But it also means that a conversion "wastes" about 1/4th of all the
> inventory space. (There is 89MB of referenced inventory, and another
> 25MB just lying around, and that is presumably *after* auto-packs have
> been clearing out stuff.)
>
> Something weird, though, I have 2 60-70MB pack files, and then a bunch
> of small ones, but those only add up to around 20MB.
>
> Still, it would seem that the conversion process bloats the in-progress
> pack files far more than I think we expected.
>
> John
> =:->
To follow up on this a bit, I started a MySQL conversion and let it run
overnight. When I checked on it this morning it had converted 39k
revisions (out of something like 65k) and was consuming 900MB of RAM.
More critically, I'm seeing a *lot* of waste in the pack files. I went
ahead and hacked in some code to give the size of the packs being packed
versus the size of the final pack. And I'm attaching a trimmed log file.
The key parts are lines like this:
35166.635 Auto-packing ... which has 20 pack files, containing 38000
35184.372 Auto-packing ... completed 101.269MB => 14.456MB
36581.414 Auto-packing ... which has 21 pack files, containing 39000
36601.117 Auto-packing ... completed 117.795MB => 17.208MB
That means that in those 10 pack files that we decided we needed to
recompact, we had 86% waste.
I don't know exactly why yet. Whether it was because we weren't applying
deltas properly, so it was causing us to rebuild the entire inventory
each time (which obviously has mostly overlap with the previous
inventories), or whether something else weird was happening.
Here are the stats so far:
Commits: 39800
Raw % Compressed % Objects
Revisions: 78886 KiB 0% 31413 KiB 5% 39800
Inventories: 582649 KiB 4% 330680 KiB 60% 751678
Texts: 12179723 KiB 94% 185072 KiB 33% 153830
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 12841259 KiB 100% 547166 KiB 100% 945308
Extra Info: count total avg stddev min max
internal node refs 515117 4841005 9 8.5 2 29
internal p_id refs 27849 194136 6 8.1 2 27
inv depth 161855 1042795 6 2.6 1 17
leaf node items 161855 972861 6 3.9 1 18
leaf p_id items 7057 68920 9 9.5 1 38
p_id depth 7057 70045 9 5.1 1 19
These trees are significantly deeper, with an average of 6 levels deep
to get to a leaf node, and 9 levels for the parent_id,basename =>
file_id map. An average of 3.8 text changes per commit, and 18.8
inventory changes per commit.
Surprisingly the p_id map only has 7k leaf nodes and 28k internal nodes.
It seems fairly compact given that it on-average 9-levels deep and at
its deepest goes all the way to 19 levels.
I do think that hash keys will be interesting here, though I think we
need a better solution for the 86% waste before we do more. It just is a
lot of work that we are throwing away when we are done.
The memory consumption is also a bit surprising, I'm guessing we have a
leaky cache, but I'm not positive. I know when I hit ^C it was inside
the LRUCache code, but I didn't do any sort of rigorous exploration.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkk2sGIACgkQJdeBCYSNAAN1QgCaA9uRIXbRbh6eGqybvfVdvvLw
/04AnA7Drsy1Tq6r0FPWFSMVncf03eFI
=f69Y
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bzr-mysql.log
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20081203/57aae5a9/attachment.diff
More information about the bazaar
mailing list