[MERGE/RFC] log refactoring
John Arbash Meinel
john at arbash-meinel.com
Wed Jan 14 22:58:26 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
> This patch refactors the implementation of log so
> that --short logging iterates most of the time,
> rather than being O(history) as often as it currently is.
>
> I'm yet to measure the performance gains, if any.
> I'll do that soon. As long as performance is as
> least as good as it currently is, I think this is
> a win in term of clearer code anyhow.
>
> A separate patch is coming to make --short logging
> the default.
>
> Ian C.
BB:comment
I haven't followed all of the changes close enough to decide if this is
"tweak" or "resubmit" yet, so I'll start with "comment".
To start with, this seems mostly like a refactoring with an attempt to
change this code:
if ( not generate_merge_revisions
and start_revision is end_revision is None
and direction == 'reverse'
and specific_fileid is None):
return _linear_view_revisions(branch)
Into a different generator, which can handle when "start_revision" and
"end_revision" is not None. Is that a reasonable summary?
Also, the handling of _NonMainlineRevisionLimit means that if you do:
a) bzr log --short -r -10..NON_MAINLINE
it will fail right away with:
'Selected log formatter only supports mainline revisions.'
b) bzr log --short -r NON_MAINLINE..-1
it will log the entire mainline, and then at the end give the same error
message. This seems a bit broken.
=== modified file 'bzrlib/branch.py'
- --- bzrlib/branch.py 2009-01-08 19:15:20 +0000
+++ bzrlib/branch.py 2009-01-14 07:17:52 +0000
@@ -467,6 +467,21 @@
else:
return (0, _mod_revision.NULL_REVISION)
+ def _get_mainline_revno(self, revision_id):
+ """Get the mainline revision number for a revision-id, if any.
+
+ :return: the mainline revision number or 0 if not on the mainline.
+ """
+ last_revno, last_revision_id = self.last_revision_info()
+ revno = last_revno
+ for rev_id in self.repository.iter_reverse_revision_history(
+ last_revision_id):
+ if rev_id == revision_id:
+ return revno
+ revno -= 1
+ else:
+ return 0
+
^- This looks like a duplicate of Branch.revision_id_to_revno, only
without any of the caching. And even if it wasn't, I wouldn't recommend
to have
+ if generate_single_revision:
+ if generate_merge_revisions:
+ generate_single_revision = False
+ else:
+ revno = branch._get_mainline_revno(start_rev_id)
^- You have a log.py function calling a 'private' member of Branch,
which I would only do if absolutely necessary. But here it definitely
can just be replaced with 'revision_id_to_revno'. Which also has the
advantage of re-using the mainline history cache that we built up when
doing RevisionSpec_revno.in_history(b). (So we don't hit the repository
a second time.)
It is a little bit hard to tell from the diff, but is it correct to say
that "_create_log_revision_iterator()" is just pulling out the call to
calculate_view_revisions and make_log_rev_iterator and putting it into a
separate function?
Further, it feels like we should be using a function on Branch rather
than repo.iter_reverse_revision_history(). Just because we know that we
have a (partial) revision_history() cache as part of Branch. The
repository should actually have things cached in memory by now, but it
at least *seems* better. I don't think this is worth a lot of time, though.
+def _mainline_view_revisions(branch, revision_limits):
+ """Calculate the mainline revisions to view, newest to oldest.
+
+ :param revision_limits: a tuple as returned by _get_revision_limits()
+ :return: An iterator of (revision_id, dotted_revno, merge_depth)
tuples.
+ :raises _NonMainlineRevisionLimit: if a start_rev_id and/or end_rev_id
+ is specified and not found
+ """
+ br_revno, br_rev_id, start_revno, start_rev_id, end_revno,
end_rev_id = \
+ revision_limits
repo = branch.repository
- - revision_ids = repo.iter_reverse_revision_history(start_revision_id)
- - for num, revision_id in enumerate(revision_ids):
- - yield revision_id, str(start_revno - num), 0
+ cur_revno = br_revno
+ rev_iter = repo.iter_reverse_revision_history(br_rev_id)
+ if start_rev_id is None and end_rev_id is None:
+ for revision_id in rev_iter:
+ yield revision_id, str(cur_revno), 0
+ cur_revno -= 1
+ else:
+ found_start = start_rev_id is None
+ found_end = end_rev_id is None
+ for revision_id in rev_iter:
+ if cur_revno < start_revno:
+ break
+ if cur_revno <= end_revno:
+ if not found_start and revision_id == start_rev_id:
+ found_start = True
+ if not found_end and revision_id == end_rev_id:
+ found_end = True
+ yield revision_id, str(cur_revno), 0
+ cur_revno -= 1
+ else:
+ if not found_start or not found_end:
+ raise _NonMainlineRevisionLimit()
^- I think if no start and end are given, then iterating as you have
done is a good thing to do, because we haven't loaded anything yet.
I'm not sure that iterating when we have endpoints is necessary. If
someone does "bzr log --short -r XXXX..YYYY" then we would have already
iterated in the repository to lookup XXXX and YYYY, so everything is
cached, and we don't win anything by iterating here. And we lose a
little bit because we have to raise _NonMainlineRevisionLimit() after
iterating for a while, rather than at the beginning.
(That said, I agree that it would be nice to not have to worry about
mainline or non-mainline revisions.)
But as I commented earlier, it looks like it will give the error after
it has already displayed the entire mainline. Which is pretty sub-optimal.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAklubhIACgkQJdeBCYSNAAPocQCff/Ld/z6FEmm4/tZPpTvYYBzK
xtAAoMgZb+9e3jIOZT/QZf/z0NzyzcF4
=8PjT
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list