brisbane-core changes

Ian Clatworthy ian.clatworthy at internode.on.net
Sun Mar 8 02:43:42 GMT 2009


John Arbash Meinel wrote:

> I'm happy to see work being done in the brisbane core branch, though I'm
> pretty concerned about these changes:
> 
> 3858 Ian Clatworthy    2009-03-07
>      don't check_remap on every unmap call in CHKMap.apply_delta()
> 
> 3859 Ian Clatworthy    2009-03-07
>      only check for remap if changes are interesting in size
> 
> 
> I think we want/need to make check_remap() cheaper, but it is the only
> thing that ensures the tree stays in 'canonical' form. And I don't think
> we can count that 50 bytes is the minimum size, or that 20 byte
> reduction will/won't trigger a remap.

So there's two separate changes here. The first one is quite straight
forward IMO. There's limited value in calling check_remap for every
delete when applying a delta - just once at the end ought to suffice.

The second is more complex and certainly needs tuning of the limits.
In my sample import stream (wordpress with 6136 revisions), check_remap
was called frequently and 90% of the time, the reduction in size was
a mere 3-4 bytes. On almost every occasion, the size afterwards never
got below 90 bytes. I don't believe a size reduction *ever* caused a
remap, yet the checks were taking 10-15% of the overall time.

We can certainly be more conservative in the limits, e.g. over 10 bytes
shrinkage is interesting and any node now below 200 bytes in size is
worth checking. Note that the shrinkage number is only applied for
nodes over the interesting minimum limit so small nodes get checked
provided any shrinkage at all happened.

As a quick test, set the interesting minimum size to 5k and the
interesting number of deletes to 0 before doing an upgrade say. Then
check .bzr.log for how often check remap is called and how often it
succeeded. In my wordpress import, I get 1 successful remap out of
4138 calls - and that was after deleting 127 items at once! The
interesting lines from .bzr.log are attached. FWIW, I did validate
this against another input stream (dbus) and got similar results.
I'd be curious what the numbers look like for a mysql upgrade say.

> If this is just to reduce "deserialise" overhead, adding a cache of
> deserialised pages or just making deserialise faster is probably a
> better fix.

deserialise is called often so we need to make it fast or cache results.
check_remap is *really* expensive so I don't think it's just because of
deserialise.

Ian C.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: remap.log
Type: text/x-log
Size: 223597 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090308/d2a267e4/attachment-0001.bin 


More information about the bazaar mailing list