[RFC] Faster load of full inventory for development6-rich-root?
John Arbash Meinel
john at arbash-meinel.com
Thu Jun 4 03:02:48 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
> John Arbash Meinel wrote:
>> Ian Clatworthy wrote:
>
>>> * 97% of bzr send is taken up in walking 2 inventories: one to find
>>> what to include in the bundle and one to calculate the testament_sha1
>>> value of the revision to merge.
>> "1 to find what to include in the bundle" is this a complete walk? Or
>> just an iter_changes walk? Certainly it doesn't seem like we should have
>> to walk the whole thing.
>
> From memory, yes.
>
>> "one to calculate the testament_sha1" this is a bigger issue, as we
>> probably *have* to walk everything.
>
> Aaron,
>
> Do we have to create the testament_sha by using the iter-changes-by-dir
> order, including both the paths and inventory entries? Or it is
> sufficient for the order to be deterministic and only include the
> inventory entries? If the latter is sufficient, I can skip the path
> calculations, hence the children lookups, by using iter_just_entries()
> and sorting by file-id?
>
> A different method for calculating the testament-sha would imply a
> merge-directive format bump. Are there other things you wanted to tweak
> at the same time if we did this?
>
>> I'm curious how my "move recent to front" patch would effect these sorts
>> of times, especially with a prototyped "batch requests".
>
> John,
>
> Which branch is that exactly?
>
lp:~jameinel/bzr/1.15-split-pack
I'm not sure that my "batch requests" ever made it out of my 'hacks'
branches.
>> I'm curious with something like OOo, if the BTreeIndex overhead isn't
>> killing us. This could be ameliorated somewhat if we knew we were
>> walking everything (like iter_entries_by_dir()) by telling the subsystem
>> to read everything. Because then it can read the root page, 255 children
>> pages, 65k next pages, etc. I would at least guess that OOo is has
>> enough chk pages that getting cache coherency is really difficult.
>>
>> I'm also curious if OOo is big enough to cause us to do a 3-level deep
>> inventory (1 root level, 1 internal, 1 leaf). Care to run 'bzr
>> repository-details' on your OOo conversion and paste the output?
>>
>
> ian at wallaby:~/Projects/scm-play/OOo-dev6$ bzr repository-details
> Commits: 262881
>
> Raw % Compressed % Objects
> Revisions: 141399 KiB 0% 22666 KiB 2% 262881
> Inventories: 15274481 KiB 42% 81460 KiB 9% 1132028
^- 1.1M inventory entries, is 869,147 chk pages.
> Texts: 20235816 KiB 56% 760684 KiB 87% 424618
> Signatures: 0 KiB 0% 0 KiB 0% 0
> Total: 35651697 KiB 100% 864810 KiB 100% 1819527
>
> Extra Info: count total avg stddev min max
> internal node refs 374075 85165916 227.7 42.4 135 255
> internal p_id refs 26762 6187129 231.2 69.9 9 255
> inv depth 433585 1021991 2.4 0.5 1 3
^- Avg depth 2.4, max 3. So it is 'just starting' to split another level
deeper.
v- With 433,585 leaf nodes, that is an average of 1.6 changes per commit
(very small). With 374k internal nodes, that is 262k root nodes, and
111k intermediate nodes.
> leaf node items 433585 36251210 83.6 77.9 1 228
> leaf p_id items 34725 5525584 159.1 127.5 1 825
> p_id depth 34725 80450 2.3 0.7 1 4
>
> Ian C.
>
And in 263k revisions, there have only been 35k changes to the tree
shape...(~1 in 8).
BTW, you had this repo publicly available, what was the URL?
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkonK0cACgkQJdeBCYSNAAP0PQCfRJYtyJHXUCDHe8l/bDSmKozt
wKsAoKsszFRwdOY3uadpZC/ao7c9RtCV
=iFzL
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list