Rev 2456: (Kent Gibson, John Arbash Meinel) Improve 'bzr log file' to both be significantly faster, and more correct. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Wed Apr 25 15:37:37 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2456
revision-id: pqm at pqm.ubuntu.com-20070425143732-c4aa489eylhhfnzg
parent: pqm at pqm.ubuntu.com-20070425065022-rsmpi4x2q1gn8536
parent: john at arbash-meinel.com-20070423192314-y2x7sr7yupacqglk
committer: Canonical.com Patch Queue Manager<pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2007-04-25 15:37:32 +0100
message:
  (Kent Gibson, John Arbash Meinel) Improve 'bzr log file' to both be significantly faster, and more correct.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/log.py                  log.py-20050505065812-c40ce11702fe5fb1
  bzrlib/tests/blackbox/test_log.py test_log.py-20060112090212-78f6ea560c868e24
  bzrlib/tests/test_log.py       testlog.py-20050728115707-1a514809d7d49309
    ------------------------------------------------------------
    revno: 2359.1.11
    merged: john at arbash-meinel.com-20070423192314-y2x7sr7yupacqglk
    parent: john at arbash-meinel.com-20070423180919-22nnm0baju1kab6x
    parent: pqm at pqm.ubuntu.com-20070423075015-340ajk1vo3pzxheu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Mon 2007-04-23 14:23:14 -0500
    message:
      [merge] bzr.dev 2446
    ------------------------------------------------------------
    revno: 2359.1.10
    merged: john at arbash-meinel.com-20070423180919-22nnm0baju1kab6x
    parent: john at arbash-meinel.com-20070423175330-cwzggk38xbsf4wyy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Mon 2007-04-23 13:09:19 -0500
    message:
      Add my name to the NEWS entry
    ------------------------------------------------------------
    revno: 2359.1.9
    merged: john at arbash-meinel.com-20070423175330-cwzggk38xbsf4wyy
    parent: john at arbash-meinel.com-20070418223921-0h2g7salf1scfd2a
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Mon 2007-04-23 12:53:30 -0500
    message:
      Only generate a new set when we need to. Drops 'bzr log NEWS' time from 22s => 8s
    ------------------------------------------------------------
    revno: 2359.1.8
    merged: john at arbash-meinel.com-20070418223921-0h2g7salf1scfd2a
    parent: john at arbash-meinel.com-20070418223759-kek0qqqdv3wddcae
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Wed 2007-04-18 17:39:21 -0500
    message:
      doc
    ------------------------------------------------------------
    revno: 2359.1.7
    merged: john at arbash-meinel.com-20070418223759-kek0qqqdv3wddcae
    parent: john at arbash-meinel.com-20070418221454-zkozu1yyny4zx3rv
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Wed 2007-04-18 17:37:59 -0500
    message:
      Create a direct test for _get_revisions_touching_file_id
    ------------------------------------------------------------
    revno: 2359.1.6
    merged: john at arbash-meinel.com-20070418221454-zkozu1yyny4zx3rv
    parent: john at arbash-meinel.com-20070418215032-38i9ynsx6fkqaiyf
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Wed 2007-04-18 17:14:54 -0500
    message:
      Create a helper tree which has a semi-interesting history.
      This allows us to test which files are actually modified.
    ------------------------------------------------------------
    revno: 2359.1.5
    merged: john at arbash-meinel.com-20070418215032-38i9ynsx6fkqaiyf
    parent: john at arbash-meinel.com-20070418214759-xmfpvwok0v6ci38l
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Wed 2007-04-18 16:50:32 -0500
    message:
      change some variable names to make the function a bit clearer.
    ------------------------------------------------------------
    revno: 2359.1.4
    merged: john at arbash-meinel.com-20070418214759-xmfpvwok0v6ci38l
    parent: warthog618 at gmail.com-20070410160523-wuwa272pl243buhf
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: log_ancestry
    timestamp: Wed 2007-04-18 16:47:59 -0500
    message:
      Refactor the specific revisions for file id into a helper function.
    ------------------------------------------------------------
    revno: 2359.1.3
    merged: warthog618 at gmail.com-20070410160523-wuwa272pl243buhf
    parent: warthog618 at gmail.com-20070410151824-9qnth9n0uiypca71
    parent: pqm at pqm.ubuntu.com-20070410074302-cf6b95587a1058cd
    committer: Kent Gibson <warthog618 at gmail.com>
    branch nick: bzr.kg
    timestamp: Wed 2007-04-11 00:05:23 +0800
    message:
      merge bzr.dev
    ------------------------------------------------------------
    revno: 2359.1.2
    merged: warthog618 at gmail.com-20070410151824-9qnth9n0uiypca71
    parent: warthog618 at gmail.com-20070315161504-lrltutbkgs7ksago
    committer: Kent Gibson <warthog618 at gmail.com>
    branch nick: bzr.kg
    timestamp: Tue 2007-04-10 23:18:24 +0800
    message:
      add logging of merge revisions
    ------------------------------------------------------------
    revno: 2359.1.1
    merged: warthog618 at gmail.com-20070315161504-lrltutbkgs7ksago
    parent: pqm at pqm.ubuntu.com-20070314155044-47c0c6257a6c6717
    committer: Kent Gibson <warthog618 at gmail.com>
    branch nick: bzr.kg
    timestamp: Fri 2007-03-16 01:15:04 +0900
    message:
      Fix ``bzr log <file>`` so it only logs the revisions that changed the file, and does it faster.
=== modified file 'NEWS'
--- a/NEWS	2007-04-25 06:50:22 +0000
+++ b/NEWS	2007-04-25 14:37:32 +0000
@@ -174,6 +174,10 @@
     * Don't produce encoding error when adding duplicate files.
       (Aaron Bentley)
 
+    * Fix ``bzr log <file>`` so it only logs the revisions that changed
+      the file, and does it faster.
+      (Kent Gibson, John Arbash Meinel, #51980, #69477)
+ 
     * Fix ``InterDirstateTre._iter_changes`` to handle when we come across
       an empty versioned directory, which now has files in it.
       (John Arbash Meinel, #104257)

=== modified file 'bzrlib/log.py'
--- a/bzrlib/log.py	2007-04-19 19:28:39 +0000
+++ b/bzrlib/log.py	2007-04-23 19:23:14 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2005 Canonical Ltd
+# Copyright (C) 2005, 2006, 2007 Canonical Ltd
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -61,7 +61,10 @@
 import bzrlib.errors as errors
 from bzrlib.symbol_versioning import deprecated_method, zero_eleven
 from bzrlib.trace import mutter
-from bzrlib.tsort import merge_sort
+from bzrlib.tsort import(
+    merge_sort,
+    topo_sort,
+    )
 
 
 def find_touching_revisions(branch, file_id):
@@ -221,8 +224,15 @@
         symbol_versioning.warn('LogFormatters should provide show_merge_revno '
             'instead of show_merge since bzr 0.11.',
             DeprecationWarning, stacklevel=3)
-    view_revisions = list(get_view_revisions(mainline_revs, rev_nos, branch,
-                          direction, include_merges=include_merges))
+    view_revs_iter = get_view_revisions(mainline_revs, rev_nos, branch,
+                          direction, include_merges=include_merges)
+    if specific_fileid:
+        view_revisions = _get_revisions_touching_file_id(branch,
+                                                         specific_fileid,
+                                                         mainline_revs,
+                                                         view_revs_iter)
+    else:
+        view_revisions = list(view_revs_iter)
 
     use_tags = getattr(lf, 'supports_tags', False)
     if use_tags:
@@ -239,7 +249,7 @@
         while revision_ids:
             cur_deltas = {}
             revisions = repository.get_revisions(revision_ids[:num])
-            if verbose or specific_fileid:
+            if verbose:
                 delta_revisions = [r for r in revisions if
                                    r.revision_id in zeros]
                 deltas = repository.get_deltas_for_revisions(delta_revisions)
@@ -247,11 +257,11 @@
                                         delta_revisions), deltas))
             for revision in revisions:
                 # The delta value will be None unless
-                # 1. verbose or specific_fileid is specified, and
+                # 1. verbose is specified, and
                 # 2. the revision is a mainline revision
                 yield revision, cur_deltas.get(revision.revision_id)
             revision_ids  = revision_ids[num:]
-            num = int(num * 1.5)
+            num = min(int(num * 1.5), 200)
             
     # now we just print all the revisions
     for ((rev_id, revno, merge_depth), (rev, delta)) in \
@@ -262,16 +272,6 @@
                 continue
 
         if merge_depth == 0:
-            # a mainline revision.
-
-            if specific_fileid:
-                if not delta.touches_file_id(specific_fileid):
-                    continue
-
-            if not verbose:
-                # although we calculated it, throw it away without display
-                delta = None
-
             if use_tags:
                 lf.show(revno, rev, delta, rev_tag_dict.get(rev_id))
             else:
@@ -287,6 +287,64 @@
                     lf.show_merge_revno(rev, merge_depth, revno)
 
 
+def _get_revisions_touching_file_id(branch, file_id, mainline_revisions,
+                                    view_revs_iter):
+    """Return the list of revision ids which touch a given file id.
+
+    This includes the revisions which directly change the file id,
+    and the revisions which merge these changes. So if the
+    revision graph is::
+        A
+        |\
+        B C
+        |/
+        D
+
+    And 'C' changes a file, then both C and D will be returned.
+
+    This will also can be restricted based on a subset of the mainline.
+
+    :return: A list of (revision_id, dotted_revno, merge_depth) tuples.
+    """
+    # find all the revisions that change the specific file
+    file_weave = branch.repository.weave_store.get_weave(file_id,
+                branch.repository.get_transaction())
+    weave_modifed_revisions = set(file_weave.versions())
+    # build the ancestry of each revision in the graph
+    # - only listing the ancestors that change the specific file.
+    rev_graph = branch.repository.get_revision_graph(mainline_revisions[-1])
+    sorted_rev_list = topo_sort(rev_graph)
+    ancestry = {}
+    for rev in sorted_rev_list:
+        parents = rev_graph[rev]
+        if rev not in weave_modifed_revisions and len(parents) == 1:
+            # We will not be adding anything new, so just use a reference to
+            # the parent ancestry.
+            rev_ancestry = ancestry[parents[0]]
+        else:
+            rev_ancestry = set()
+            if rev in weave_modifed_revisions:
+                rev_ancestry.add(rev)
+            for parent in parents:
+                rev_ancestry = rev_ancestry.union(ancestry[parent])
+        ancestry[rev] = rev_ancestry
+
+    def is_merging_rev(r):
+        parents = rev_graph[r]
+        if len(parents) > 1:
+            leftparent = parents[0]
+            for rightparent in parents[1:]:
+                if not ancestry[leftparent].issuperset(
+                        ancestry[rightparent]):
+                    return True
+        return False
+
+    # filter from the view the revisions that did not change or merge 
+    # the specific file
+    return [(r, n, d) for r, n, d in view_revs_iter
+            if r in weave_modifed_revisions or is_merging_rev(r)]
+
+
 def get_view_revisions(mainline_revs, rev_nos, branch, direction,
                        include_merges=True):
     """Produce an iterator of revisions to show

=== modified file 'bzrlib/tests/blackbox/test_log.py'
--- a/bzrlib/tests/blackbox/test_log.py	2007-04-19 19:28:39 +0000
+++ b/bzrlib/tests/blackbox/test_log.py	2007-04-23 19:23:14 +0000
@@ -334,3 +334,39 @@
         tree.commit('revision 1')
         tree.bzrdir.destroy_workingtree()
         self.run_bzr('log', 'tree/file')
+
+    def test_log_file(self):
+        """The log for a particular file should only list revs for that file"""
+        tree = self.make_branch_and_tree('parent')
+        self.build_tree(['parent/file1', 'parent/file2', 'parent/file3'])
+        tree.add('file1')
+        tree.commit('add file1')
+        tree.add('file2')
+        tree.commit('add file2')
+        tree.add('file3')
+        tree.commit('add file3')
+        self.run_bzr('branch', 'parent', 'child')
+        print >> file('child/file2', 'wb'), 'hello'
+        self.run_bzr('commit', '-m', 'branch 1', 'child')
+        os.chdir('parent')
+        self.run_bzr('merge', '../child')
+        self.run_bzr('commit', '-m', 'merge child branch')
+        
+        log = self.run_bzr('log', 'file1')[0]
+        self.assertContainsRe(log, 'revno: 1\n')
+        self.assertNotContainsRe(log, 'revno: 2\n')
+        self.assertNotContainsRe(log, 'revno: 3\n')
+        self.assertNotContainsRe(log, 'revno: 3.1.1\n')
+        self.assertNotContainsRe(log, 'revno: 4\n')
+        log = self.run_bzr('log', 'file2')[0]
+        self.assertNotContainsRe(log, 'revno: 1\n')
+        self.assertContainsRe(log, 'revno: 2\n')
+        self.assertNotContainsRe(log, 'revno: 3\n')
+        self.assertContainsRe(log, 'revno: 3.1.1\n')
+        self.assertContainsRe(log, 'revno: 4\n')
+        log = self.run_bzr('log', 'file3')[0]
+        self.assertNotContainsRe(log, 'revno: 1\n')
+        self.assertNotContainsRe(log, 'revno: 2\n')
+        self.assertContainsRe(log, 'revno: 3\n')
+        self.assertNotContainsRe(log, 'revno: 3.1.1\n')
+        self.assertNotContainsRe(log, 'revno: 4\n')

=== modified file 'bzrlib/tests/test_log.py'
--- a/bzrlib/tests/test_log.py	2006-11-03 18:35:29 +0000
+++ b/bzrlib/tests/test_log.py	2007-04-18 22:37:59 +0000
@@ -1,6 +1,4 @@
-# Copyright (C) 2005 Canonical Ltd
-# -*- coding: utf-8 -*-
-# vim: encoding=utf-8
+# Copyright (C) 2005, 2006, 2007 Canonical Ltd
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -19,6 +17,7 @@
 import os
 from cStringIO import StringIO
 
+from bzrlib import log
 from bzrlib.tests import BzrTestBase, TestCaseWithTransport
 from bzrlib.log import (show_log, 
                         get_view_revisions, 
@@ -400,3 +399,135 @@
         self.assertEqual([('1', '1', 0), ('2', '2', 0), ('3c', '3', 0),
             ('4b', '4', 0)],
             revisions)
+
+
+class TestGetRevisionsTouchingFileID(TestCaseWithTransport):
+
+    def create_tree_with_single_merge(self):
+        """Create a branch with a moderate layout.
+
+        The revision graph looks like:
+
+           A
+           |\
+           B C
+           |/
+           D
+
+        In this graph, A introduced files f1 and f2 and f3.
+        B modifies f1 and f3, and C modifies f2 and f3.
+        D merges the changes from B and C and resolves the conflict for f3.
+        """
+        # TODO: jam 20070218 This seems like it could really be done
+        #       with make_branch_and_memory_tree() if we could just
+        #       create the content of those files.
+        # TODO: jam 20070218 Another alternative is that we would really
+        #       like to only create this tree 1 time for all tests that
+        #       use it. Since 'log' only uses the tree in a readonly
+        #       fashion, it seems a shame to regenerate an identical
+        #       tree for each test.
+        tree = self.make_branch_and_tree('tree')
+        tree.lock_write()
+        self.addCleanup(tree.unlock)
+
+        self.build_tree_contents([('tree/f1', 'A\n'),
+                                  ('tree/f2', 'A\n'),
+                                  ('tree/f3', 'A\n'),
+                                 ])
+        tree.add(['f1', 'f2', 'f3'], ['f1-id', 'f2-id', 'f3-id'])
+        tree.commit('A', rev_id='A')
+
+        self.build_tree_contents([('tree/f2', 'A\nC\n'),
+                                  ('tree/f3', 'A\nC\n'),
+                                 ])
+        tree.commit('C', rev_id='C')
+        # Revert back to A to build the other history.
+        tree.set_last_revision('A')
+        tree.branch.set_last_revision_info(1, 'A')
+        self.build_tree_contents([('tree/f1', 'A\nB\n'),
+                                  ('tree/f2', 'A\n'),
+                                  ('tree/f3', 'A\nB\n'),
+                                 ])
+        tree.commit('B', rev_id='B')
+        tree.set_parent_ids(['B', 'C'])
+        self.build_tree_contents([('tree/f1', 'A\nB\n'),
+                                  ('tree/f2', 'A\nC\n'),
+                                  ('tree/f3', 'A\nB\nC\n'),
+                                 ])
+        tree.commit('D', rev_id='D')
+
+        # Switch to a read lock for this tree.
+        # We still have addCleanup(unlock)
+        tree.unlock()
+        tree.lock_read()
+        return tree
+
+    def test_tree_with_single_merge(self):
+        """Make sure the tree layout is correct."""
+        tree = self.create_tree_with_single_merge()
+        rev_A_tree = tree.branch.repository.revision_tree('A')
+        rev_B_tree = tree.branch.repository.revision_tree('B')
+
+        f1_changed = (u'f1', 'f1-id', 'file', True, False)
+        f2_changed = (u'f2', 'f2-id', 'file', True, False)
+        f3_changed = (u'f3', 'f3-id', 'file', True, False)
+
+        delta = rev_B_tree.changes_from(rev_A_tree)
+        self.assertEqual([f1_changed, f3_changed], delta.modified)
+        self.assertEqual([], delta.renamed)
+        self.assertEqual([], delta.added)
+        self.assertEqual([], delta.removed)
+
+        rev_C_tree = tree.branch.repository.revision_tree('C')
+        delta = rev_C_tree.changes_from(rev_A_tree)
+        self.assertEqual([f2_changed, f3_changed], delta.modified)
+        self.assertEqual([], delta.renamed)
+        self.assertEqual([], delta.added)
+        self.assertEqual([], delta.removed)
+
+        rev_D_tree = tree.branch.repository.revision_tree('D')
+        delta = rev_D_tree.changes_from(rev_B_tree)
+        self.assertEqual([f2_changed, f3_changed], delta.modified)
+        self.assertEqual([], delta.renamed)
+        self.assertEqual([], delta.added)
+        self.assertEqual([], delta.removed)
+
+        delta = rev_D_tree.changes_from(rev_C_tree)
+        self.assertEqual([f1_changed, f3_changed], delta.modified)
+        self.assertEqual([], delta.renamed)
+        self.assertEqual([], delta.added)
+        self.assertEqual([], delta.removed)
+
+    def assertAllRevisionsForFileID(self, tree, file_id, revisions):
+        """Make sure _get_revisions_touching_file_id returns the right values.
+
+        Get the return value from _get_revisions_touching_file_id and make
+        sure they are correct.
+        """
+        # The api for _get_revisions_touching_file_id is a little crazy,
+        # So we do the setup here.
+        mainline = tree.branch.revision_history()
+        mainline.insert(0, None)
+        revnos = dict((rev, idx+1) for idx, rev in enumerate(mainline))
+        view_revs_iter = log.get_view_revisions(mainline, revnos, tree.branch,
+                                                'reverse', True)
+        actual_revs = log._get_revisions_touching_file_id(tree.branch, file_id,
+                                                          mainline,
+                                                          view_revs_iter)
+        self.assertEqual(revisions, [r for r, revno, depth in actual_revs])
+
+    def test_file_id_f1(self):
+        tree = self.create_tree_with_single_merge()
+        # f1 should be marked as modified by revisions A and B
+        self.assertAllRevisionsForFileID(tree, 'f1-id', ['B', 'A'])
+
+    def test_file_id_f2(self):
+        tree = self.create_tree_with_single_merge()
+        # f2 should be marked as modified by revisions A, C, and D
+        # because D merged the changes from C.
+        self.assertAllRevisionsForFileID(tree, 'f2-id', ['D', 'C', 'A'])
+
+    def test_file_id_f3(self):
+        tree = self.create_tree_with_single_merge()
+        # f3 should be marked as modified by revisions A, B, C, and D
+        self.assertAllRevisionsForFileID(tree, 'f2-id', ['D', 'C', 'A'])




More information about the bazaar-commits mailing list