Merging a bundle w/ a pack repository is slow

Robert Collins robertc at robertcollins.net
Fri Nov 30 02:12:03 GMT 2007


On Thu, 2007-11-29 at 17:38 -0600, John Arbash Meinel wrote:
> 
> 
> Basically, we know that "get_ancestry()" is going to need most of the
> graph, as
> it is a whole-history operation. So this tells the index that it needs
> to
> buffer everything. (Sort of like when a DB realizes it should do a
> sequential
> scan versus an index scan.)

I have a patch for bzrlib.index that does that in fact, it looks for 10%
of keys returned, and then triggers a buffer_all. I haven't proposed it
because I haven't analysed if 10% is too low a threshold. 

> In talking with Robert, he has a valid point that we are trying to
> make the
> Bazaar codebase *never* require a full-table scan. So having this in
> bzr.dev
> won't prod all of us to go fix the bug. I would still like to propose
> that we
> merge it into a final release (1.0) even if we don't put it into the
> release
> candidates.
> 
> I would probably go one step further, and say we should merge it into
> bzr.dev
> now, and then remove it after we have our next release. I know we want
> to be
> exposed to the locations where the current codebase is performing
> poorly. I
> don't know that I want to be bothered by it every time I'm trying to
> get work
> done. (I feel like I haven't been very productive this week, because
> around
> every corner I run into another regression that I spend far too long
> tracking
> down. Though I suppose I could just note it and submit a bug, without
> doing any
> investigation, but that isn't usually how it works for me.)

I'd really really like to have a no-compromise approach to packs. We
know that the old approach of O(history) operations does not work, and
we have to fix it. Every cheap-hack we put in is something that has to
come *right back out again* for us to get correct performance out of
hpss, out of local operations, and out of the library.

We're not punishing code that uses bad api's, we *removing* the
constraint that made bad apis no worse than good api's, right at the
bottom level. Putting in workarounds for things that are accurately
representing the problem - well thats no workaround at all.

Additionally, the partial index code is improvable; as is the index
layer. These are both things that are worth doing (index layer first -
bug 165309), then code performance around it.

> I think we could do something like this for "get_revision_graph()" as
> well, and
> all the other functions that return all of history. Those functions
> *should* be
> deprecated. And we should write all of our functions to not use them.
> But until
> we get there, if we are going to make a Bazaar release with packs as
> the
> default format then I think we should put in some quick regression
> fixes like this.

The less obvious regression are, the harder it is to find them. Our
users right now are telling use where they are. This is a good thing.

> (I know Robert feels that just introducing packs as the default won't
> cause
> people to automatically upgrade their big old repositories. *I* think
> saying
> "we have a new repository format, and it is the default" will cause
> people to
> go: "Oh, I should use that"... "Oh, why are they switching to this,
> these
> commands have suddenly gotten 2x slower, they must not know what they
> are doing.")

I think this is a reason to finish the work on packs.

I'm bb:reject on this for bzr.dev. I think it's really really harmful.
I'm glad we have a wide selection of folk using packs now and giving us
this feedback; reducing the quality of that feedback... well, foot, gun,

BANG.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071130/9770edf4/attachment-0001.pgp 


More information about the bazaar mailing list