[RFC] Faster load of full inventory for development6-rich-root?

Thu Jun 4 03:02:48 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John Arbash Meinel wrote:
>> Ian Clatworthy wrote:
> 
>>> * 97% of bzr send is taken up in walking 2 inventories: one to find
>>>   what to include in the bundle and one to calculate the testament_sha1
>>>   value of the revision to merge.
>> "1 to find what to include in the bundle" is this a complete walk? Or
>> just an iter_changes walk? Certainly it doesn't seem like we should have
>> to walk the whole thing.
> 
> From memory, yes.
> 
>> "one to calculate the testament_sha1" this is a bigger issue, as we
>> probably *have* to walk everything.
> 
> Aaron,
> 
> Do we have to create the testament_sha by using the iter-changes-by-dir
> order, including both the paths and inventory entries? Or it is
> sufficient for the order to be deterministic and only include the
> inventory entries? If the latter is sufficient, I can skip the path
> calculations, hence the children lookups, by using iter_just_entries()
> and sorting by file-id?
> 
> A different method for calculating the testament-sha would imply a
> merge-directive format bump. Are there other things you wanted to tweak
> at the same time if we did this?
> 
>> I'm curious how my "move recent to front" patch would effect these sorts
>> of times, especially with a prototyped "batch requests".
> 
> John,
> 
> Which branch is that exactly?
> 

lp:~jameinel/bzr/1.15-split-pack

I'm not sure that my "batch requests" ever made it out of my 'hacks'
branches.

>> I'm curious with something like OOo, if the BTreeIndex overhead isn't
>> killing us. This could be ameliorated somewhat if we knew we were
>> walking everything (like iter_entries_by_dir()) by telling the subsystem
>> to read everything. Because then it can read the root page, 255 children
>> pages, 65k next pages, etc. I would at least guess that OOo is has
>> enough chk pages that getting cache coherency is really difficult.
>>
>> I'm also curious if OOo is big enough to cause us to do a 3-level deep
>> inventory (1 root level, 1 internal, 1 leaf). Care to run 'bzr
>> repository-details' on your OOo conversion and paste the output?
>>
> 
> ian at wallaby:~/Projects/scm-play/OOo-dev6$ bzr repository-details
> Commits: 262881
> 
>                       Raw    %    Compressed    %  Objects
> Revisions:     141399 KiB   0%     22666 KiB   2%   262881
> Inventories: 15274481 KiB  42%     81460 KiB   9%  1132028

^- 1.1M inventory entries, is 869,147 chk pages.

> Texts:       20235816 KiB  56%    760684 KiB  87%   424618
> Signatures:         0 KiB   0%         0 KiB   0%        0
> Total:       35651697 KiB 100%    864810 KiB 100%  1819527
> 
> Extra Info:           count    total    avg stddev  min  max
> internal node refs   374075 85165916  227.7   42.4  135  255
> internal p_id refs    26762  6187129  231.2   69.9    9  255
> inv depth            433585  1021991    2.4    0.5    1    3

^- Avg depth 2.4, max 3. So it is 'just starting' to split another level
deeper.

v- With 433,585 leaf nodes, that is an average of 1.6 changes per commit
(very small). With 374k internal nodes, that is 262k root nodes, and
111k intermediate nodes.

> leaf node items      433585 36251210   83.6   77.9    1  228
> leaf p_id items       34725  5525584  159.1  127.5    1  825
> p_id depth            34725    80450    2.3    0.7    1    4
> 
> Ian C.
> 

And in 263k revisions, there have only been 35k changes to the tree
shape...(~1 in 8).

BTW, you had this repo publicly available, what was the URL?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkonK0cACgkQJdeBCYSNAAP0PQCfRJYtyJHXUCDHe8l/bDSmKozt
wKsAoKsszFRwdOY3uadpZC/ao7c9RtCV
=iFzL
-----END PGP SIGNATURE-----