Merging a bundle w/ a pack repository is slow

John Arbash Meinel john at arbash-meinel.com
Fri Nov 30 02:43:24 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Thu, 2007-11-29 at 17:38 -0600, John Arbash Meinel wrote:
>>
>> Basically, we know that "get_ancestry()" is going to need most of the
>> graph, as
>> it is a whole-history operation. So this tells the index that it needs
>> to
>> buffer everything. (Sort of like when a DB realizes it should do a
>> sequential
>> scan versus an index scan.)
> 
> I have a patch for bzrlib.index that does that in fact, it looks for 10%
> of keys returned, and then triggers a buffer_all. I haven't proposed it
> because I haven't analysed if 10% is too low a threshold. 

Given my current numbers, it is probably too high of a threshold. That is an
interesting way of doing it, since it means that you don't have to load all of
the indexes across all packs.

I don't know where the "break-even" would be, but you have to realize that
you've paid the cost of 10% round trips, than then you pay an additional
buffer_all cost. Whereas if the algorithm *knows* it is O(History) it can tell
you in advance and at least save the 10% rt cost.

I do like the auto-load for cases like "bzr log plugin" when a plugin with 10
revisions is sharing the repository of bzr.dev with 15,000 revisions.


...

>> I think we could do something like this for "get_revision_graph()" as
>> well, and
>> all the other functions that return all of history. Those functions
>> *should* be
>> deprecated. And we should write all of our functions to not use them.
>> But until
>> we get there, if we are going to make a Bazaar release with packs as
>> the
>> default format then I think we should put in some quick regression
>> fixes like this.
> 
> The less obvious regression are, the harder it is to find them. Our
> users right now are telling use where they are. This is a good thing.

We could do the same thing with:

def get_ancestry(self):
  trace.warning('get_ancestry() called, this function is known to'
                ' perform poorly, please report this to bazaar at ...')

For each place that I would add a buffer_all() call, we *know* that they are
- -Devil. I don't need a user to report a performance regression to tell me that.
We have '-Devil' precisely so that developers can figure out what commands are
using poorly scaling functions.

I *don't* think we need to abuse our users who are willing to experiment with a
new format. It is fine for functions we don't know yet. But once we know that
function foo() is being naughty, if we can make a quick fix, and mark it as
something to work on, then we have a nice worklist of things to do, and we have
users which can remain blissfully unaware until "wow this is even faster, great!".


...

> I think this is a reason to finish the work on packs.
> 
> I'm bb:reject on this for bzr.dev. I think it's really really harmful.
> I'm glad we have a wide selection of folk using packs now and giving us
> this feedback; reducing the quality of that feedback... well, foot, gun,
> 
> BANG.
> 
> -Rob
> 

I don't think we need users to bang there heads against the wall of something
we have already identified needs fixing. Sure, we need to spend some time
actually getting rid of and deprecating all of the -Devil functions. Let's do that.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHT3jMJdeBCYSNAAMRAq33AJ9K51Gl77tQleqz6H4XnTD/9+g2PACeP0lf
biIsZor6q3NAxQSbH9uc8VA=
=wB9I
-----END PGP SIGNATURE-----



More information about the bazaar mailing list