[MERGE/CHK] Use the page cache during 'check_remap'
John Arbash Meinel
john at arbash-meinel.com
Wed Dec 24 18:00:29 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> This update changes the _check_remap code a little bit, so that it can
> avoid unnecessary calls to get_record_stream() during _check_remap.
>
> It also adds some trivial local functions that allow lsprof to show what
> the common results of remap are. (Whether it actually does need to
> remap, or whether it determines it doesn't need to because of existing
> children, etc.)
>
> This patch has a pretty big benefit for conversion speed, because we
> have to call check_remap whenever we unmap or an entry shrinks. And
> because of the structure, lots of things are in the page cache that
> aren't unpacked in the local chk_map. (create_by_apply_delta starts from
> an empty chk_map each time.)
To give a specific number here, the time to convert 1k revs of mysql was
3m32s with 255-way fan out, and it drops to 2m11s. It seems likely that
w/ 255 way fan out, the check_remap code had a lot more children it
could potentially evaluate. And without the page_cache being used, it
would actually page in all 255 children using get_record_stream. Which
was already a little bit slower because of the delta chains.
Unfortunately, 255-way fan out with simple line-delta compression
doesn't seem to be getting me to the idea "25% overhead" that I was
hoping for.
It looks good for the first 1k revisions (24%), but at 21k revisions it
has gotten a bit bloated (41%). Though the other information seems
reasonable:
$ wbzr repository-details d4-256-delta-mysql/
Commits: 21200
Raw % Compressed % Objects
Revisions: 42737 KiB 0% 16740 KiB 8% 21200
Inventories: 457272 KiB 5% 77110 KiB 41% 256731
Texts: 7431463 KiB 93% 93227 KiB 49% 83243
Signatures: 0 KiB 0% 0 KiB 0% 0
Total: 7931473 KiB 100% 187077 KiB 100% 361174
Extra Info: count total avg stddev min max
internal node refs 116941 7568327 64 90.2 11 255
internal p_id refs 4734 526578 111 88.2 2 235
inv depth 106794 309482 2 2.0 1 3
leaf node items 106794 225401 2 3.2 1 14
leaf p_id items 7062 43409 6 10.2 1 58
p_id depth 7062 22736 3 1.4 1 4
The depth is still 2-3, which is expected based on the size of the tree.
The number of internal pages is comparable to the number of leaf nodes.
Which means that on average for each leaf change we have one internal
node change. I'm guessing it is because:
1) for every commit the root node changes, but when you have multiple
leaves changing, you still only have 1 root node change.
2) You still would expect 1 more internal node change per leaf change,
because of the intermediate layer. However, since we average 5 leaf
changes per commit, it may be that we get more overlap.
Further, we probably still have some leaf nodes hanging directly off of
the root node.
And I wonder if the average number of leaf changes isn't because of
large merges. So most changes only have 1-2 leaf changes, but then you
have a few that change 100s, which brings the average up. But 100s of
changes also get more overlap in the internal nodes that need updating.
I'm going to play with it an see if I can't get a "more optimal" delta
ordering, and see if that makes a bigger impact.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAklSeL0ACgkQJdeBCYSNAAP0KwCgrdv9Q0PEYazKNNSyIHJSaP+6
k6YAnivxrJDPe59uUYM2QNxeYVuEBgNF
=oBSp
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list