[MERGE] log performance

Fri Jun 23 13:09:27 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 22 Jun 2006, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
>> +    def make_heavily_merged_tree(self, directory_name='.'):
>> +        """Create a tree with an egregious number of commits.
>> +        
>> +        No files change are included.
>> +        """
> 
> "egregious" means "outstandingly bad" but I don't think it's really
> right here - having 250 cross merges is just large, not egregious.

I'll change it.  I copied and pasted it from another tree_building function.

> It might (?) be good to add another sentence about what the structure of
> merges are, in case anyone cares: two branches, one repeatedly merges
> single commits from the other.

Okay.

> 

> These won't unwind properly in some cases; if you're going to go to the
> trouble of cleaning up (which is good) you might as well do it properly,

Okay, will do.  Since policy is that tree.lock takes out a branch and
repository lock, I can simplify things anyhow.

>> === modified file bzrlib/log.py
>> --- bzrlib/log.py
>> +++ bzrlib/log.py
>> @@ -206,35 +206,25 @@
>>      cut_revs = which_revs[(start_revision-1):(end_revision)]
>>      if not cut_revs:
>>          return
>> +
>> +    # convert the revision history to a dictionary:
>> +    rev_nos = dict([(k, v) for v, k in cut_revs])
>> +
> 
> You can actually just say rev_nos = dict(cut_revs) if cut_revs is a
> sequence of pairs, and that will avoid building a list.

Unfortunately, this sequence of pairs has the wrong ordering (value, key)

>> +    if getattr(lf, 'show_merge', False) is False:
>> +        no_merges = True 
>>      else:
>> -        raise ValueError('invalid direction %r' % direction)
>> -
>> -    revision_history = branch.revision_history()
>> -
>> -    # convert the revision history to a dictionary:
>> -    rev_nos = {}
>> -    for index, rev_id in cut_revs:
>> -        rev_nos[rev_id] = index
>> +        no_merges = False
>> +    view_revisions = list(get_view_revisions(mainline_revs, rev_nos, branch,
>> +                          direction, no_merges=no_merges))
>>  
>>      def iter_revisions():
>> -        revision_ids = [r for s, r, m, e in merge_sorted_revisions]
>> +        revision_ids = [r for r, n, d in view_revisions]
>>          num = 9
>>          while revision_ids:
>>              revisions = branch.repository.get_revisions(revision_ids[:num])
>> @@ -247,8 +237,8 @@
>>          for revision in revisions:
>>              yield revision
>>      # now we just print all the revisions
>> -    for ((sequence, rev_id, merge_depth, end_of_merge), rev) in \
>> -        izip(merge_sorted_revisions, iter_revisions()):
>> +    for ((rev_id, revno, merge_depth), rev) in \
>> +         izip(view_revisions, iter_revisions()):
>>  
>>          if searchRE:
>>              if not searchRE.search(rev.message):
>> @@ -267,11 +257,41 @@
>>                  # although we calculated it, throw it away without display
>>                  delta = None
>>  
>> -            lf.show(rev_nos[rev_id], rev, delta)
>> -        elif hasattr(lf, 'show_merge'):
>> +            lf.show(revno, rev, delta)
>> +        else:
>>              lf.show_merge(rev, merge_depth)
> 
> There's some nice reductions here.  I'm not 100% comfortable with the
> hasattr -- I generally like to reserve it for cases where you're really
> doing "dynamic" lookups, though it's fuzzy in Python and I won't veto
> it.  But is it the presence of the attribute or its value that matters?

Actually, I was replacing hasattr (which was already in the code) with
getattr.  My understanding is that hasattr is badly, badly broken and
should not be used.

I'm using getattr to do introspection.  The log formatter that shows
merges has a show_merge() method.  The other log formatters don't have
that method, because they don't show merges.  So if the attribute is
present, we'll show merges.

> If the second, as I'd expect, shouldn't the second call be getattr too?

The hasattr call was deleted by the patch.

> Rather than an inverted-sense boolean why not make it just
> include_merges=True?

Sounds good.

> 
>> === modified file bzrlib/tests/test_log.py
>> @@ -301,3 +305,38 @@
>>          logfile.seek(0)
>>          log_contents = logfile.read()
>>          self.assertEqualDiff(log_contents, '1: Line-Log-Formatte... 2005-11-23 add a\n')
>> +
>> +    def test_get_view_revisions(self):

> This test is a bit long, therefore harder to read and see just what it's
> testing.  Could you perhaps split out the setup of building the
> histories and have each assertion within a separate test?

Sure, I'll split it up.  But I think assertions like the following
belong together, because they're testing what happens when you vary one
parameter.

self.assertEqual(revisions, [('1', 1, 0), ('2', 2, 0), ('3', 3, 0)])
self.assertEqual(revisions, revisions2)

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEm9n30F+nu1YWqI0RAuFUAJwP3mRmgzdfQV8VO26ZiFkQ9PAMwACeLwUB
mbB8PfG2KwC4YNg5nmuB5W0=
=aAua
-----END PGP SIGNATURE-----