[MERGE] cache inventory properly
John Arbash Meinel
john at arbash-meinel.com
Tue Jul 25 01:26:09 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Well, after doing some more profiling, and a lot of discussion, it seems
Michael is getting his wish.
_KnitData.read_records_iter is now defined to return records in
undefined order, since all of its callers either build up a map, or do
not depend on the exact order returned.
In addition, the 'inventory.knit' file is now explicitly cached, since
we don't want to download it twice.
It turns out, this has less memory impact than the order of reading the
records. Probably because it is caching the gzipped contents, rather
than the extracted contents + digest.
The attached graph shows the difference in memory consumption between
bzr.dev and cache-inventory. This is for branching jelmer's samba 4.0
branch. (Which has 9134 ancestors, and 150MB in .bzr).
bzr-0.8 uses more memory than all therest, with a plateau of 430MB and a
peak of 462MB.
bzr.dev stablizes at 355MB for most of the time, and then has a peak of
387MB near the end. It also has a total memory consumption of 132
gibibyte-seconds. (integral under the memory curve)
cache-inventory grows continually, but overall is much lower with a
plateau of around 150MB, and a final peak of 188MB.
The total consumption is 40 gibibyte-seconds.
It is slightly faster, most likely because it is caching the
inventory.knit file, so it doesn't have to download it a second time.
Not caching inventory.knit does save memory. It plateaus at 110MB, but
peaks at 186MB, so it doesn't save the peak memory cost. (This might be
a hint as to what the final peak is).
The attached graph has 0.8 in red, bzr.dev in pink, cache-inventory in
blue, and no-cache in blue.
As far as the impact of caching or not caching the inventory.knit file,
I simulated delay using 'tc' with a 50ms delay, and then did a branch
over the local network. All versions have the new http updates. I also
did a branch from bazaar-vcs.org.
time bzr branch bzr.dev
238 454
time bzr-cache branch bzr.dev
225 380
time bzr-nocache branch bzr.dev
232 446
The latency test isn't perfect (my ping to bazaar-vcs is 100ms, not 50),
and for stuff like inventory.knit, the download is actually bandwidth
limited.
And testing against bazaar-vcs.org, I had a swing from 11min down to 6
minutes in my testing. (Probably depends on my network congestion.) I
ran them all a few times, and just took the fastest.
But from the above, you can see that I shaved off 1.2min from the branch
bzr.dev time.
Either way, the attached patch saves us both memory consumption and
total number of bytes downloaded, so I'd like to get it into 0.9.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFExWUhJdeBCYSNAAMRAvQPAJ4ow1FNxPkABHCfiPzqyM2h6Jm2sACff2ac
XKk1p7+Q2TiHQX46S6J8DlM=
=2KLS
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mem-consumption.png
Type: image/png
Size: 8344 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060724/80928138/attachment.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cache-inventory.diff
Type: text/x-patch
Size: 14101 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060724/80928138/attachment.bin
More information about the bazaar
mailing list