Merging a bundle w/ a pack repository is slow

John Arbash Meinel john at arbash-meinel.com
Thu Nov 29 23:38:30 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I think this is another performance regression, stemming from "get_ancestry()"
being slow with packs.

I'm merging a patch into a branch, and I've merged the patch before, so I
already have the revisions in my repository.

so doing:

bzr merge ../patch
bzr revert
bzr merge ../patch # <== What I'm talking about

Spends a lot of time just thinking about what it needs to do.

I found it spends a lot of time in 'get_ancestry()' because of:
  /Users/jameinel/dev/bzr/bzr.dev/bzrlib/merge.py(149)from_revision_ids()
- -> merger.set_base_revision(base, base_branch)
> /Users/jameinel/dev/bzr/bzr.dev/bzrlib/merge.py(304)set_base_revision()
- -> self.this_branch)
  /Users/jameinel/dev/bzr/bzr.dev/bzrlib/revision.py(140)is_ancestor()
- -> topo_sorted=False))

^- That is_ancestor call is written as:
    if is_null(candidate_id):
        return True
    return (candidate_id in branch.repository.get_ancestry(revision_id,
            topo_sorted=False))

While I know that "get_ancestry()" is a -Devil function because it is
O(history), it does seem like packs is punishing a lot of code for using it.
(It is slow because it has a round trip into get_parents() for every node, and
that is triggering _lookup_parent_by_location).

If you call buffer_all (in the attached patch) it changes the time rather
dramatically.

time bzr merge ../patch
17.80s user 1.60s system 75% cpu 25.712 total

time bzr.patched merge ../patch
bzr merge   12.48s user 1.54s system 80% cpu 17.397 total

The attached patch is a band-aid, but I'd like to at least propose something
like it for a possible release.

Basically, we know that "get_ancestry()" is going to need most of the graph, as
it is a whole-history operation. So this tells the index that it needs to
buffer everything. (Sort of like when a DB realizes it should do a sequential
scan versus an index scan.)

In talking with Robert, he has a valid point that we are trying to make the
Bazaar codebase *never* require a full-table scan. So having this in bzr.dev
won't prod all of us to go fix the bug. I would still like to propose that we
merge it into a final release (1.0) even if we don't put it into the release
candidates.

I would probably go one step further, and say we should merge it into bzr.dev
now, and then remove it after we have our next release. I know we want to be
exposed to the locations where the current codebase is performing poorly. I
don't know that I want to be bothered by it every time I'm trying to get work
done. (I feel like I haven't been very productive this week, because around
every corner I run into another regression that I spend far too long tracking
down. Though I suppose I could just note it and submit a bug, without doing any
investigation, but that isn't usually how it works for me.)

I think we could do something like this for "get_revision_graph()" as well, and
all the other functions that return all of history. Those functions *should* be
deprecated. And we should write all of our functions to not use them. But until
we get there, if we are going to make a Bazaar release with packs as the
default format then I think we should put in some quick regression fixes like this.

(I know Robert feels that just introducing packs as the default won't cause
people to automatically upgrade their big old repositories. *I* think saying
"we have a new repository format, and it is the default" will cause people to
go: "Oh, I should use that"... "Oh, why are they switching to this, these
commands have suddenly gotten 2x slower, they must not know what they are doing.")

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHT011JdeBCYSNAAMRAreHAJ44QD2zeivos//IWNdt2o4QnuY5BgCfTSZw
age6gF50vu+QjaUo3JPJ4sI=
=/XI3
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: buffer_all.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20071129/bd333296/attachment.diff 


More information about the bazaar mailing list