Rev 2456: (Kent Gibson, John Arbash Meinel) Improve 'bzr log file' to both be significantly faster, and more correct. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Wed Apr 25 15:37:37 BST 2007
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 2456
revision-id: pqm at pqm.ubuntu.com-20070425143732-c4aa489eylhhfnzg
parent: pqm at pqm.ubuntu.com-20070425065022-rsmpi4x2q1gn8536
parent: john at arbash-meinel.com-20070423192314-y2x7sr7yupacqglk
committer: Canonical.com Patch Queue Manager<pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2007-04-25 15:37:32 +0100
message:
(Kent Gibson, John Arbash Meinel) Improve 'bzr log file' to both be significantly faster, and more correct.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
bzrlib/tests/blackbox/test_log.py test_log.py-20060112090212-78f6ea560c868e24
bzrlib/tests/test_log.py testlog.py-20050728115707-1a514809d7d49309
------------------------------------------------------------
revno: 2359.1.11
merged: john at arbash-meinel.com-20070423192314-y2x7sr7yupacqglk
parent: john at arbash-meinel.com-20070423180919-22nnm0baju1kab6x
parent: pqm at pqm.ubuntu.com-20070423075015-340ajk1vo3pzxheu
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Mon 2007-04-23 14:23:14 -0500
message:
[merge] bzr.dev 2446
------------------------------------------------------------
revno: 2359.1.10
merged: john at arbash-meinel.com-20070423180919-22nnm0baju1kab6x
parent: john at arbash-meinel.com-20070423175330-cwzggk38xbsf4wyy
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Mon 2007-04-23 13:09:19 -0500
message:
Add my name to the NEWS entry
------------------------------------------------------------
revno: 2359.1.9
merged: john at arbash-meinel.com-20070423175330-cwzggk38xbsf4wyy
parent: john at arbash-meinel.com-20070418223921-0h2g7salf1scfd2a
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Mon 2007-04-23 12:53:30 -0500
message:
Only generate a new set when we need to. Drops 'bzr log NEWS' time from 22s => 8s
------------------------------------------------------------
revno: 2359.1.8
merged: john at arbash-meinel.com-20070418223921-0h2g7salf1scfd2a
parent: john at arbash-meinel.com-20070418223759-kek0qqqdv3wddcae
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Wed 2007-04-18 17:39:21 -0500
message:
doc
------------------------------------------------------------
revno: 2359.1.7
merged: john at arbash-meinel.com-20070418223759-kek0qqqdv3wddcae
parent: john at arbash-meinel.com-20070418221454-zkozu1yyny4zx3rv
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Wed 2007-04-18 17:37:59 -0500
message:
Create a direct test for _get_revisions_touching_file_id
------------------------------------------------------------
revno: 2359.1.6
merged: john at arbash-meinel.com-20070418221454-zkozu1yyny4zx3rv
parent: john at arbash-meinel.com-20070418215032-38i9ynsx6fkqaiyf
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Wed 2007-04-18 17:14:54 -0500
message:
Create a helper tree which has a semi-interesting history.
This allows us to test which files are actually modified.
------------------------------------------------------------
revno: 2359.1.5
merged: john at arbash-meinel.com-20070418215032-38i9ynsx6fkqaiyf
parent: john at arbash-meinel.com-20070418214759-xmfpvwok0v6ci38l
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Wed 2007-04-18 16:50:32 -0500
message:
change some variable names to make the function a bit clearer.
------------------------------------------------------------
revno: 2359.1.4
merged: john at arbash-meinel.com-20070418214759-xmfpvwok0v6ci38l
parent: warthog618 at gmail.com-20070410160523-wuwa272pl243buhf
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: log_ancestry
timestamp: Wed 2007-04-18 16:47:59 -0500
message:
Refactor the specific revisions for file id into a helper function.
------------------------------------------------------------
revno: 2359.1.3
merged: warthog618 at gmail.com-20070410160523-wuwa272pl243buhf
parent: warthog618 at gmail.com-20070410151824-9qnth9n0uiypca71
parent: pqm at pqm.ubuntu.com-20070410074302-cf6b95587a1058cd
committer: Kent Gibson <warthog618 at gmail.com>
branch nick: bzr.kg
timestamp: Wed 2007-04-11 00:05:23 +0800
message:
merge bzr.dev
------------------------------------------------------------
revno: 2359.1.2
merged: warthog618 at gmail.com-20070410151824-9qnth9n0uiypca71
parent: warthog618 at gmail.com-20070315161504-lrltutbkgs7ksago
committer: Kent Gibson <warthog618 at gmail.com>
branch nick: bzr.kg
timestamp: Tue 2007-04-10 23:18:24 +0800
message:
add logging of merge revisions
------------------------------------------------------------
revno: 2359.1.1
merged: warthog618 at gmail.com-20070315161504-lrltutbkgs7ksago
parent: pqm at pqm.ubuntu.com-20070314155044-47c0c6257a6c6717
committer: Kent Gibson <warthog618 at gmail.com>
branch nick: bzr.kg
timestamp: Fri 2007-03-16 01:15:04 +0900
message:
Fix ``bzr log <file>`` so it only logs the revisions that changed the file, and does it faster.
=== modified file 'NEWS'
--- a/NEWS 2007-04-25 06:50:22 +0000
+++ b/NEWS 2007-04-25 14:37:32 +0000
@@ -174,6 +174,10 @@
* Don't produce encoding error when adding duplicate files.
(Aaron Bentley)
+ * Fix ``bzr log <file>`` so it only logs the revisions that changed
+ the file, and does it faster.
+ (Kent Gibson, John Arbash Meinel, #51980, #69477)
+
* Fix ``InterDirstateTre._iter_changes`` to handle when we come across
an empty versioned directory, which now has files in it.
(John Arbash Meinel, #104257)
=== modified file 'bzrlib/log.py'
--- a/bzrlib/log.py 2007-04-19 19:28:39 +0000
+++ b/bzrlib/log.py 2007-04-23 19:23:14 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2005 Canonical Ltd
+# Copyright (C) 2005, 2006, 2007 Canonical Ltd
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -61,7 +61,10 @@
import bzrlib.errors as errors
from bzrlib.symbol_versioning import deprecated_method, zero_eleven
from bzrlib.trace import mutter
-from bzrlib.tsort import merge_sort
+from bzrlib.tsort import(
+ merge_sort,
+ topo_sort,
+ )
def find_touching_revisions(branch, file_id):
@@ -221,8 +224,15 @@
symbol_versioning.warn('LogFormatters should provide show_merge_revno '
'instead of show_merge since bzr 0.11.',
DeprecationWarning, stacklevel=3)
- view_revisions = list(get_view_revisions(mainline_revs, rev_nos, branch,
- direction, include_merges=include_merges))
+ view_revs_iter = get_view_revisions(mainline_revs, rev_nos, branch,
+ direction, include_merges=include_merges)
+ if specific_fileid:
+ view_revisions = _get_revisions_touching_file_id(branch,
+ specific_fileid,
+ mainline_revs,
+ view_revs_iter)
+ else:
+ view_revisions = list(view_revs_iter)
use_tags = getattr(lf, 'supports_tags', False)
if use_tags:
@@ -239,7 +249,7 @@
while revision_ids:
cur_deltas = {}
revisions = repository.get_revisions(revision_ids[:num])
- if verbose or specific_fileid:
+ if verbose:
delta_revisions = [r for r in revisions if
r.revision_id in zeros]
deltas = repository.get_deltas_for_revisions(delta_revisions)
@@ -247,11 +257,11 @@
delta_revisions), deltas))
for revision in revisions:
# The delta value will be None unless
- # 1. verbose or specific_fileid is specified, and
+ # 1. verbose is specified, and
# 2. the revision is a mainline revision
yield revision, cur_deltas.get(revision.revision_id)
revision_ids = revision_ids[num:]
- num = int(num * 1.5)
+ num = min(int(num * 1.5), 200)
# now we just print all the revisions
for ((rev_id, revno, merge_depth), (rev, delta)) in \
@@ -262,16 +272,6 @@
continue
if merge_depth == 0:
- # a mainline revision.
-
- if specific_fileid:
- if not delta.touches_file_id(specific_fileid):
- continue
-
- if not verbose:
- # although we calculated it, throw it away without display
- delta = None
-
if use_tags:
lf.show(revno, rev, delta, rev_tag_dict.get(rev_id))
else:
@@ -287,6 +287,64 @@
lf.show_merge_revno(rev, merge_depth, revno)
+def _get_revisions_touching_file_id(branch, file_id, mainline_revisions,
+ view_revs_iter):
+ """Return the list of revision ids which touch a given file id.
+
+ This includes the revisions which directly change the file id,
+ and the revisions which merge these changes. So if the
+ revision graph is::
+ A
+ |\
+ B C
+ |/
+ D
+
+ And 'C' changes a file, then both C and D will be returned.
+
+ This will also can be restricted based on a subset of the mainline.
+
+ :return: A list of (revision_id, dotted_revno, merge_depth) tuples.
+ """
+ # find all the revisions that change the specific file
+ file_weave = branch.repository.weave_store.get_weave(file_id,
+ branch.repository.get_transaction())
+ weave_modifed_revisions = set(file_weave.versions())
+ # build the ancestry of each revision in the graph
+ # - only listing the ancestors that change the specific file.
+ rev_graph = branch.repository.get_revision_graph(mainline_revisions[-1])
+ sorted_rev_list = topo_sort(rev_graph)
+ ancestry = {}
+ for rev in sorted_rev_list:
+ parents = rev_graph[rev]
+ if rev not in weave_modifed_revisions and len(parents) == 1:
+ # We will not be adding anything new, so just use a reference to
+ # the parent ancestry.
+ rev_ancestry = ancestry[parents[0]]
+ else:
+ rev_ancestry = set()
+ if rev in weave_modifed_revisions:
+ rev_ancestry.add(rev)
+ for parent in parents:
+ rev_ancestry = rev_ancestry.union(ancestry[parent])
+ ancestry[rev] = rev_ancestry
+
+ def is_merging_rev(r):
+ parents = rev_graph[r]
+ if len(parents) > 1:
+ leftparent = parents[0]
+ for rightparent in parents[1:]:
+ if not ancestry[leftparent].issuperset(
+ ancestry[rightparent]):
+ return True
+ return False
+
+ # filter from the view the revisions that did not change or merge
+ # the specific file
+ return [(r, n, d) for r, n, d in view_revs_iter
+ if r in weave_modifed_revisions or is_merging_rev(r)]
+
+
def get_view_revisions(mainline_revs, rev_nos, branch, direction,
include_merges=True):
"""Produce an iterator of revisions to show
=== modified file 'bzrlib/tests/blackbox/test_log.py'
--- a/bzrlib/tests/blackbox/test_log.py 2007-04-19 19:28:39 +0000
+++ b/bzrlib/tests/blackbox/test_log.py 2007-04-23 19:23:14 +0000
@@ -334,3 +334,39 @@
tree.commit('revision 1')
tree.bzrdir.destroy_workingtree()
self.run_bzr('log', 'tree/file')
+
+ def test_log_file(self):
+ """The log for a particular file should only list revs for that file"""
+ tree = self.make_branch_and_tree('parent')
+ self.build_tree(['parent/file1', 'parent/file2', 'parent/file3'])
+ tree.add('file1')
+ tree.commit('add file1')
+ tree.add('file2')
+ tree.commit('add file2')
+ tree.add('file3')
+ tree.commit('add file3')
+ self.run_bzr('branch', 'parent', 'child')
+ print >> file('child/file2', 'wb'), 'hello'
+ self.run_bzr('commit', '-m', 'branch 1', 'child')
+ os.chdir('parent')
+ self.run_bzr('merge', '../child')
+ self.run_bzr('commit', '-m', 'merge child branch')
+
+ log = self.run_bzr('log', 'file1')[0]
+ self.assertContainsRe(log, 'revno: 1\n')
+ self.assertNotContainsRe(log, 'revno: 2\n')
+ self.assertNotContainsRe(log, 'revno: 3\n')
+ self.assertNotContainsRe(log, 'revno: 3.1.1\n')
+ self.assertNotContainsRe(log, 'revno: 4\n')
+ log = self.run_bzr('log', 'file2')[0]
+ self.assertNotContainsRe(log, 'revno: 1\n')
+ self.assertContainsRe(log, 'revno: 2\n')
+ self.assertNotContainsRe(log, 'revno: 3\n')
+ self.assertContainsRe(log, 'revno: 3.1.1\n')
+ self.assertContainsRe(log, 'revno: 4\n')
+ log = self.run_bzr('log', 'file3')[0]
+ self.assertNotContainsRe(log, 'revno: 1\n')
+ self.assertNotContainsRe(log, 'revno: 2\n')
+ self.assertContainsRe(log, 'revno: 3\n')
+ self.assertNotContainsRe(log, 'revno: 3.1.1\n')
+ self.assertNotContainsRe(log, 'revno: 4\n')
=== modified file 'bzrlib/tests/test_log.py'
--- a/bzrlib/tests/test_log.py 2006-11-03 18:35:29 +0000
+++ b/bzrlib/tests/test_log.py 2007-04-18 22:37:59 +0000
@@ -1,6 +1,4 @@
-# Copyright (C) 2005 Canonical Ltd
-# -*- coding: utf-8 -*-
-# vim: encoding=utf-8
+# Copyright (C) 2005, 2006, 2007 Canonical Ltd
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -19,6 +17,7 @@
import os
from cStringIO import StringIO
+from bzrlib import log
from bzrlib.tests import BzrTestBase, TestCaseWithTransport
from bzrlib.log import (show_log,
get_view_revisions,
@@ -400,3 +399,135 @@
self.assertEqual([('1', '1', 0), ('2', '2', 0), ('3c', '3', 0),
('4b', '4', 0)],
revisions)
+
+
+class TestGetRevisionsTouchingFileID(TestCaseWithTransport):
+
+ def create_tree_with_single_merge(self):
+ """Create a branch with a moderate layout.
+
+ The revision graph looks like:
+
+ A
+ |\
+ B C
+ |/
+ D
+
+ In this graph, A introduced files f1 and f2 and f3.
+ B modifies f1 and f3, and C modifies f2 and f3.
+ D merges the changes from B and C and resolves the conflict for f3.
+ """
+ # TODO: jam 20070218 This seems like it could really be done
+ # with make_branch_and_memory_tree() if we could just
+ # create the content of those files.
+ # TODO: jam 20070218 Another alternative is that we would really
+ # like to only create this tree 1 time for all tests that
+ # use it. Since 'log' only uses the tree in a readonly
+ # fashion, it seems a shame to regenerate an identical
+ # tree for each test.
+ tree = self.make_branch_and_tree('tree')
+ tree.lock_write()
+ self.addCleanup(tree.unlock)
+
+ self.build_tree_contents([('tree/f1', 'A\n'),
+ ('tree/f2', 'A\n'),
+ ('tree/f3', 'A\n'),
+ ])
+ tree.add(['f1', 'f2', 'f3'], ['f1-id', 'f2-id', 'f3-id'])
+ tree.commit('A', rev_id='A')
+
+ self.build_tree_contents([('tree/f2', 'A\nC\n'),
+ ('tree/f3', 'A\nC\n'),
+ ])
+ tree.commit('C', rev_id='C')
+ # Revert back to A to build the other history.
+ tree.set_last_revision('A')
+ tree.branch.set_last_revision_info(1, 'A')
+ self.build_tree_contents([('tree/f1', 'A\nB\n'),
+ ('tree/f2', 'A\n'),
+ ('tree/f3', 'A\nB\n'),
+ ])
+ tree.commit('B', rev_id='B')
+ tree.set_parent_ids(['B', 'C'])
+ self.build_tree_contents([('tree/f1', 'A\nB\n'),
+ ('tree/f2', 'A\nC\n'),
+ ('tree/f3', 'A\nB\nC\n'),
+ ])
+ tree.commit('D', rev_id='D')
+
+ # Switch to a read lock for this tree.
+ # We still have addCleanup(unlock)
+ tree.unlock()
+ tree.lock_read()
+ return tree
+
+ def test_tree_with_single_merge(self):
+ """Make sure the tree layout is correct."""
+ tree = self.create_tree_with_single_merge()
+ rev_A_tree = tree.branch.repository.revision_tree('A')
+ rev_B_tree = tree.branch.repository.revision_tree('B')
+
+ f1_changed = (u'f1', 'f1-id', 'file', True, False)
+ f2_changed = (u'f2', 'f2-id', 'file', True, False)
+ f3_changed = (u'f3', 'f3-id', 'file', True, False)
+
+ delta = rev_B_tree.changes_from(rev_A_tree)
+ self.assertEqual([f1_changed, f3_changed], delta.modified)
+ self.assertEqual([], delta.renamed)
+ self.assertEqual([], delta.added)
+ self.assertEqual([], delta.removed)
+
+ rev_C_tree = tree.branch.repository.revision_tree('C')
+ delta = rev_C_tree.changes_from(rev_A_tree)
+ self.assertEqual([f2_changed, f3_changed], delta.modified)
+ self.assertEqual([], delta.renamed)
+ self.assertEqual([], delta.added)
+ self.assertEqual([], delta.removed)
+
+ rev_D_tree = tree.branch.repository.revision_tree('D')
+ delta = rev_D_tree.changes_from(rev_B_tree)
+ self.assertEqual([f2_changed, f3_changed], delta.modified)
+ self.assertEqual([], delta.renamed)
+ self.assertEqual([], delta.added)
+ self.assertEqual([], delta.removed)
+
+ delta = rev_D_tree.changes_from(rev_C_tree)
+ self.assertEqual([f1_changed, f3_changed], delta.modified)
+ self.assertEqual([], delta.renamed)
+ self.assertEqual([], delta.added)
+ self.assertEqual([], delta.removed)
+
+ def assertAllRevisionsForFileID(self, tree, file_id, revisions):
+ """Make sure _get_revisions_touching_file_id returns the right values.
+
+ Get the return value from _get_revisions_touching_file_id and make
+ sure they are correct.
+ """
+ # The api for _get_revisions_touching_file_id is a little crazy,
+ # So we do the setup here.
+ mainline = tree.branch.revision_history()
+ mainline.insert(0, None)
+ revnos = dict((rev, idx+1) for idx, rev in enumerate(mainline))
+ view_revs_iter = log.get_view_revisions(mainline, revnos, tree.branch,
+ 'reverse', True)
+ actual_revs = log._get_revisions_touching_file_id(tree.branch, file_id,
+ mainline,
+ view_revs_iter)
+ self.assertEqual(revisions, [r for r, revno, depth in actual_revs])
+
+ def test_file_id_f1(self):
+ tree = self.create_tree_with_single_merge()
+ # f1 should be marked as modified by revisions A and B
+ self.assertAllRevisionsForFileID(tree, 'f1-id', ['B', 'A'])
+
+ def test_file_id_f2(self):
+ tree = self.create_tree_with_single_merge()
+ # f2 should be marked as modified by revisions A, C, and D
+ # because D merged the changes from C.
+ self.assertAllRevisionsForFileID(tree, 'f2-id', ['D', 'C', 'A'])
+
+ def test_file_id_f3(self):
+ tree = self.create_tree_with_single_merge()
+ # f3 should be marked as modified by revisions A, B, C, and D
+ self.assertAllRevisionsForFileID(tree, 'f2-id', ['D', 'C', 'A'])
More information about the bazaar-commits
mailing list