bzr-revnocache can now cache the merge graph
John Arbash Meinel
john at arbash-meinel.com
Wed Jan 21 20:36:45 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
> The trunk of bzr-revnocache now lets you configure what
> gets cached: "just" the rev-id=>revno map or the full
> (sorted) merge graph. If you're interested in helping
> to test out the latter, grab the latest code and edit
> the top of __init__.py. (FWIW, you can also disable
> caching altogether if you want as well.)
>
> Note that the trigger for storing the merge graph is
> currently still Branch.get_revision_id_to_revno_map().
> There is no Branch.iter_merge_revisions() API that
> I can override yet. A MERGE/RFC is in the review queue
> for this.
>
> As a next step, I'd like to try out caching "mergelines"
> in addition to or instead of the full merge graph. I
> expect a mergeline record to look something like:
>
> mergerevno depth length last_rev_id join_rev_id [first_rev_id split_rev_id]
>
> See my comment in https://bugs.launchpad.net/bzr/+bug/125773
> for an explanation of these terms. If I'm wasting my time going
> down this track, please let me know!
I'm not entirely sure I understand what you are caching. An example
might help. This is the last 3 revisions of bzr.dev:
revno: 3949
revision-id: pqm at pqm.ubuntu.com-20090120210300-641tutf1rkdn8a3n
revno: 3943.4.5
revision-id: john at arbash-meinel.com-20090120201745-gkvsfuhc9tf7m1bm
revno: 3943.4.4
revision-id: john at arbash-meinel.com-20090116223613-sywet9fbx0589a3z
revno: 3943.4.3
revision-id: john at arbash-meinel.com-20090116223224-hspiby4drt1ng7uy
revno: 3943.4.2
revision-id: john at arbash-meinel.com-20090116222734-9mgzsmtcvv9qr2mt
revno: 3943.4.1
revision-id: john at arbash-meinel.com-20090116221614-k8su03l5d22tq6iv
- ------------------------------------------------------------
revno: 3948
revision-id: pqm at pqm.ubuntu.com-20090120044335-pwr2rshr1yu6vzti
revno: 3946.1.2
revision-id:ian.clatworthy at canonical.com-20090120040338-yxg0r2ab45lqtks9
- ------------------------------------------------------------
revno: 3947
revision-id: pqm at pqm.ubuntu.com-20090120032136-alahvfk4g7y8iczn
revno: 3946.1.1
revision-id:
ian.clatworthy at canonical.com-20090120021235-n8bd5kzz3s624aq5
revno: 3943.3.4
revision-id: amanic at gmail.com-20090118014815-hmowk5wx102lwquz
revno: 3943.3.3
revision-id: amanic at gmail.com-20090118011839-qanqndkfqrkek7e1
revno: 3943.3.2
revision-id: amanic at gmail.com-20090118010721-2s62eobbgk51qqn9
revno: 3943.3.1
revision-id: amanic at gmail.com-20090118004202-nh1b5bpr91ty2rxq
So what would end up in the cache, something like:
3943.4 1 5 john at arbash-meinel.com-20090120201745-gkvsfuhc9tf7m1bm
john at arbash-meinel.com-20090116221614-k8su03l5d22tq6iv
I guess my question is "mergerevno" the revno this 'line' was merged
into (3949), or is it the prefix on the revisions themselves (3943.4)
What is last, join, first and split?
Also, notice that for 3947, we have a double merge, but both are at
depth 1. We basically have this graph (time flows down):
A
|\
| B
: |
| C
| \
D |
|\ /
| E
|/
F
A, D, F are the mainline of bzr, B, C was done as amanic, and E is ian's
branch of bzr.dev which then merges amanic's work and submits it.
I believe the depth of B & C are still 1, because E didn't have any
other commits. Contrast that with:
A
|\
| B
: |
| C
| \
D |
|\ |
| E |
| |/
| F
|/
G
The 'merge sort' way of graphing this is actually:
A
|\
D \
|\ \
| E |
| | |
| | B
| | |
| | C
| |/
| F
|/
G
>
> At a minimum, I expect this cache will:
>
> * be a small fraction of the other caches size wise
> and therefore be orders of magnitude faster to load
I think this could be a good mix of "cache enough to be useful" but
still make use of the parent_map to avoid redundant information. It is
unfortunate that it is difficult to use this to speed up revision_id =>
dotted_revno.
I suppose if you read the whole cache, and track what revision_ids are
present, you might be able to start at a revision, and then walk back
until you find a revision in the cache. Which should give you one end of
the map.
>
> * speed up lookups of dotted_revno -> rev_id because
> of this.
>
> I also wonder whether we can reconstitute the sorted merge
> graph and/or rev_id=>revno map from *just* this information
> efficiently? Thoughts? Sample code to help me write
>
> def iter_merge_lines(branch, merge_graph_iter)?
>
> Perhaps that API together with something like
>
> def mergeline_revision_info(branch, mergerevno)
>
> would be useful in other parts of bzrlib or plugins?
> I suspect knowing where each mergeline got joined into
> the graph (i.e. a child pointer) must be useful in
> some graph walking algorithms, yes?
>
> In summary, bzr-revnocache continues to grow as a
> proof-of-concept. I think it could nicely complement
> John's lazy-revno calculation branch which focusses
> on efficiently going the other direction, i.e.
> rev-id to revno conversion. As branch is extended
> with APIs like iter_merge_revisions(), performance
> improvements ought to gradually ripple though the
> various commands as the code starts using them (and
> users install this plugin).
>
> Ian C.
>
>
I think this is worth exploring. I'm not sure that it is exactly the way
we want to go, but it might.
Also, I think you need to consider things like the emacs' repository,
which has about 95k revisions along the mainline. Would one of the
"mergelines" be the mainline itself? Doesn't it seem like an mid-sized
cap on the distance would be warranted? (Include every 500th revision
along a line, etc.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkl3h10ACgkQJdeBCYSNAAOSVACePKPckeGkPTVOP8UpF9+VMVxJ
WZsAoLzKCq1AYPWzz/7/0iVExeABIIIa
=+TQ5
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list