Rev 3735: (jam) 'bzr log file' now shows more relevant info, in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Thu Sep 25 02:21:50 BST 2008
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 3735
revision-id: pqm at pqm.ubuntu.com-20080925012144-k71s2olv2fpy771x
parent: pqm at pqm.ubuntu.com-20080924081540-ecfvp6xq4x9gs81n
parent: john at arbash-meinel.com-20080922200907-0uv1skmq5jo1skvn
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2008-09-25 02:21:44 +0100
message:
(jam) 'bzr log file' now shows more relevant info,
and does so in a more efficient manner.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
bzrlib/tests/test_log.py testlog.py-20050728115707-1a514809d7d49309
------------------------------------------------------------
revno: 3711.3.23
revision-id: john at arbash-meinel.com-20080922200907-0uv1skmq5jo1skvn
parent: john at arbash-meinel.com-20080922132935-klw5fk22s4s1imrx
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Mon 2008-09-22 15:09:07 -0500
message:
Documentation and cleanup.
Improve the docstring for _filter_revisions_touching_file_id to indicate
what revisions will and won't be returned.
Other general suggestions from Ian.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
bzrlib/tests/test_log.py testlog.py-20050728115707-1a514809d7d49309
------------------------------------------------------------
revno: 3711.3.22
revision-id: john at arbash-meinel.com-20080922132935-klw5fk22s4s1imrx
parent: john at arbash-meinel.com-20080921144837-wi61tf7gr4jfwl5d
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Mon 2008-09-22 08:29:35 -0500
message:
Remove a hangover 'assert'
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.21
revision-id: john at arbash-meinel.com-20080921144837-wi61tf7gr4jfwl5d
parent: john at arbash-meinel.com-20080921141555-r6npeijzl5ic1r6r
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Sun 2008-09-21 09:48:37 -0500
message:
Fix GraphIndex to properly generate _nodes_by_keys on demand.
modified:
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
------------------------------------------------------------
revno: 3711.3.20
revision-id: john at arbash-meinel.com-20080921141555-r6npeijzl5ic1r6r
parent: john at arbash-meinel.com-20080921140124-kvoi3sdig2owjukf
parent: pqm at pqm.ubuntu.com-20080921012105-ote1u11mokjim9ir
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Sun 2008-09-21 09:15:55 -0500
message:
merge bzr.dev and clean up NEWS
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/builtins.py builtins.py-20050830033751-fc01482b9ca23183
bzrlib/directory_service.py directory_service.py-20080305221044-vr2mkvlsk8jypa2y-1
bzrlib/help_topics/__init__.py help_topics.py-20060920210027-rnim90q9e0bwxvy4-1
bzrlib/revisionspec.py revisionspec.py-20050907152633-17567659fd5c0ddb
bzrlib/smart/medium.py medium.py-20061103051856-rgu2huy59fkz902q-1
bzrlib/smart/server.py server.py-20061110062051-chzu10y32vx8gvur-1
bzrlib/tests/blackbox/test_branch.py test_branch.py-20060524161337-noms9gmcwqqrfi8y-1
bzrlib/tests/blackbox/test_remove_tree.py test_remove_tree.py-20061110192919-5j3xjciiaqbs2dvo-1
bzrlib/tests/blackbox/test_switch.py test_switch.py-20071122111948-0c5en6uz92bwl76h-1
bzrlib/tests/test_directory_service.py test_directory_servi-20080305221044-vr2mkvlsk8jypa2y-2
bzrlib/tests/test_transport_implementations.py test_transport_implementations.py-20051227111451-f97c5c7d5c49fce7
bzrlib/tests/tree_implementations/test_inv.py test_inv.py-20070312023226-0cdvk5uwhutis9vg-1
bzrlib/tests/tree_implementations/test_tree.py test_tree.py-20061215160206-usu7lwcj8aq2n3br-1
bzrlib/transform.py transform.py-20060105172343-dd99e54394d91687
bzrlib/transport/local.py local_transport.py-20050711165921-9b1f142bfe480c24
bzrlib/tree.py tree.py-20050309040759-9d5f2496be663e77
------------------------------------------------------------
revno: 3711.3.19
revision-id: john at arbash-meinel.com-20080921140124-kvoi3sdig2owjukf
parent: john at arbash-meinel.com-20080921135843-wnwpal216u3zt2ni
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Sun 2008-09-21 09:01:24 -0500
message:
Add a TODO discussing how our index requests should evolve.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.18
revision-id: john at arbash-meinel.com-20080921135843-wnwpal216u3zt2ni
parent: john at arbash-meinel.com-20080919035127-okaoylx3tcrtn65w
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Sun 2008-09-21 08:58:43 -0500
message:
Update NEWS based on the current algorithm.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
------------------------------------------------------------
revno: 3711.3.17
revision-id: john at arbash-meinel.com-20080919035127-okaoylx3tcrtn65w
parent: john at arbash-meinel.com-20080919035029-aibb9jq5waxjjs7c
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 22:51:27 -0500
message:
Fix the 'test_log' tests
modified:
bzrlib/tests/test_log.py testlog.py-20050728115707-1a514809d7d49309
------------------------------------------------------------
revno: 3711.3.16
revision-id: john at arbash-meinel.com-20080919035029-aibb9jq5waxjjs7c
parent: john at arbash-meinel.com-20080919034224-laq19nw522j3c6ge
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 22:50:29 -0500
message:
Doc update.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.15
revision-id: john at arbash-meinel.com-20080919034224-laq19nw522j3c6ge
parent: john at arbash-meinel.com-20080919032139-pdcuw9ryqatpqdn9
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 22:42:24 -0500
message:
Work around GraphIndex inefficiencies by requesting keys 1000 at a time.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.14
revision-id: john at arbash-meinel.com-20080919032139-pdcuw9ryqatpqdn9
parent: john at arbash-meinel.com-20080919013023-31adhm4mt3obrjst
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 22:21:39 -0500
message:
Change the per-file log algorithm dramatically.
Now we use the merge_sort information to decide when to show
another entry. It seems this changes what is actually logged.
This is because sometimes a branch merges the trunk. And in that
case, it will be bringing in the changes to a file, which the
feature branch did not have before.
However, for people using 'bzr log file' I believe those are
uninteresting nodes.
Memory consumption is good (240MB, mostly from text_index._buffer_all),
speed is good (28s, also mostly in _buffer_all).
Changing the code to request one key at a time saves the
buffer all, but at the expense of lots of bisects. In some
cases it is faster and saves memory (23s, 88MB), but on
bzr.dev it is slower.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.13
revision-id: john at arbash-meinel.com-20080919013023-31adhm4mt3obrjst
parent: john at arbash-meinel.com-20080918204343-0h7rbblut820u6i8
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 20:30:23 -0500
message:
Shave off another 5s by not building 'node_by_key'
modified:
bzrlib/index.py index.py-20070712131115-lolkarso50vjr64s-1
------------------------------------------------------------
revno: 3711.3.12
revision-id: john at arbash-meinel.com-20080918204343-0h7rbblut820u6i8
parent: john at arbash-meinel.com-20080918200531-woiund7811i2kgyy
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 15:43:43 -0500
message:
Document the advantage of using a revision_id set, instead of a (file_id, revision_id) one.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.11
revision-id: john at arbash-meinel.com-20080918200531-woiund7811i2kgyy
parent: john at arbash-meinel.com-20080918200316-v76fi3cotgu3qfun
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 15:05:31 -0500
message:
Clean out the debugging code.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.10
revision-id: john at arbash-meinel.com-20080918200316-v76fi3cotgu3qfun
parent: john at arbash-meinel.com-20080918193947-0nbfx6f2yu3zvi3k
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 15:03:16 -0500
message:
Document some other alternatives, but in the end, just go with the memory consuming fast 'frozenset()'
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.9
revision-id: john at arbash-meinel.com-20080918193947-0nbfx6f2yu3zvi3k
parent: john at arbash-meinel.com-20080918193505-d1jihbz38y6ooty3
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 14:39:47 -0500
message:
Switch back to using a single dict.
The algorithm is simpler, and though I rember differently, the
numbers show that it performs identically.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.8
revision-id: john at arbash-meinel.com-20080918193505-d1jihbz38y6ooty3
parent: john at arbash-meinel.com-20080918191751-wk23agjzhkiw2wo5
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 14:35:05 -0500
message:
It seems the specific layout of dicts doesn't matter.
Using 2 dicts, 1 dict and a list, 1 dict all had identical memory
and CPU numbers. The only thing that matters is:
a) We can't use mixed mode with a single dictionary, because
it gets very confused and performs poorly.
b) If you always use tuples, you cut memory from 410MB => 272MB, but
performance jumps 42s => 50s
c) 'mixed' mode is inbetween, with 400MB and 43s. However, it is
pretty much just the 'frozenset' performance.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.7
revision-id: john at arbash-meinel.com-20080918191751-wk23agjzhkiw2wo5
parent: john at arbash-meinel.com-20080918190406-r75lzwnmjjth94h6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 14:17:51 -0500
message:
Instead of using a dictionary for cached ancestries,
use a list. This also lets us use integer offsets in our mapping,
rather than revision ids.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.6
revision-id: john at arbash-meinel.com-20080918190406-r75lzwnmjjth94h6
parent: john at arbash-meinel.com-20080918182621-g2w4klj590g2zptb
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 14:04:06 -0500
message:
Changing back to using a single dict saves memory,
at the cost of CPU.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.5
revision-id: john at arbash-meinel.com-20080918182621-g2w4klj590g2zptb
parent: john at arbash-meinel.com-20080918182100-n1gwuzs3ajy27kdc
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 13:26:21 -0500
message:
a bit more debug info
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.4
revision-id: john at arbash-meinel.com-20080918182100-n1gwuzs3ajy27kdc
parent: john at arbash-meinel.com-20080918164436-tlhfuf010iz9gf5e
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 13:21:00 -0500
message:
Significantly faster, but consuming more memory.
We are back down to 42.6s, but we now consume 409MB.
The current form keeps the value list separate from the
revision pointers. That way we don't have to re-do the
computation when we hit the same nodes repeatedly.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.3
revision-id: john at arbash-meinel.com-20080918164436-tlhfuf010iz9gf5e
parent: john at arbash-meinel.com-20080918162150-vi5xcsfwrgaql0it
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 11:44:36 -0500
message:
A bit of a Frankenstein now, but it drops memory dramatically.
Now we build up new nodes into a list, and cache them as tuples.
If we end up computing the frozenset() for a node, we then cache it
as such.
Memory consumption is now only 101MB, though processing time is 53s.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.2
revision-id: john at arbash-meinel.com-20080918162150-vi5xcsfwrgaql0it
parent: john at arbash-meinel.com-20080918161558-745a1vuxechgcq8h
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 11:21:50 -0500
message:
Trade off time versus memory consumption.
[frozen]set() objects always waste a bit of memory, in order
to avoid hash collisions. So if we cache the ancestry by
using tuples instead of sets, we can save a bit of memory.
For mysql, this is 272MB versus 380MB. However, it costs
CPU time to regenerate all of these sets 56.7s versus 44.2s.
modified:
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
------------------------------------------------------------
revno: 3711.3.1
revision-id: john at arbash-meinel.com-20080918161558-745a1vuxechgcq8h
parent: pqm at pqm.ubuntu.com-20080917230446-p0wippqwikt511sp
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lighter_log_file
timestamp: Thu 2008-09-18 11:15:58 -0500
message:
Change the per-file ancestry code in log.py
The old code would create a new set() object whenever there
was more than one parent. However, many of those cases did
not introduce any new per-file revisions, and thus was
wasted memory.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/log.py log.py-20050505065812-c40ce11702fe5fb1
=== modified file 'NEWS'
--- a/NEWS 2008-09-24 06:52:03 +0000
+++ b/NEWS 2008-09-25 01:21:44 +0000
@@ -9,6 +9,12 @@
CHANGES:
+ * ``bzr log file`` has been changed. It now uses a different method
+ for determining which revisions to show as merging the changes to
+ the file. It now only shows revisions which merged the change
+ towards your mainline. This simplifies the output, makes it faster,
+ and reduces memory consumption. (John Arbash Meinel)
+
FEATURES
IMPROVEMENTS:
=== modified file 'bzrlib/index.py'
--- a/bzrlib/index.py 2008-09-03 22:30:56 +0000
+++ b/bzrlib/index.py 2008-09-21 14:48:37 +0000
@@ -366,7 +366,7 @@
self._keys_by_offset = {}
# ready-to-return key:value or key:value, node_ref_lists
self._nodes = {}
- self._nodes_by_key = {}
+ self._nodes_by_key = None
trailers = 0
pos = stream.tell()
lines = stream.read().split('\n')
@@ -381,26 +381,30 @@
else:
node_value = value
self._nodes[key] = node_value
- if self._key_length > 1:
- # TODO: We may want to do this lazily, but if we are calling
- # _buffer_all, we are likely to be doing
- # iter_entries_prefix
- key_dict = self._nodes_by_key
- if self.node_ref_lists:
- key_value = key, node_value[0], node_value[1]
- else:
- key_value = key, node_value
- # For a key of (foo, bar, baz) create
- # _nodes_by_key[foo][bar][baz] = key_value
- for subkey in key[:-1]:
- key_dict = key_dict.setdefault(subkey, {})
- key_dict[key[-1]] = key_value
# cache the keys for quick set intersections
self._keys = set(self._nodes)
if trailers != 1:
# there must be one line - the empty trailer line.
raise errors.BadIndexData(self)
+ def _get_nodes_by_key(self):
+ if self._nodes_by_key is None:
+ nodes_by_key = {}
+ if self.node_ref_lists:
+ for key, (value, references) in self._nodes.iteritems():
+ key_dict = nodes_by_key
+ for subkey in key[:-1]:
+ key_dict = key_dict.setdefault(subkey, {})
+ key_dict[key[-1]] = key, value, references
+ else:
+ for key, value in self._nodes.iteritems():
+ key_dict = nodes_by_key
+ for subkey in key[:-1]:
+ key_dict = key_dict.setdefault(subkey, {})
+ key_dict[key[-1]] = key, value
+ self._nodes_by_key = nodes_by_key
+ return self._nodes_by_key
+
def iter_all_entries(self):
"""Iterate over all keys within the index.
@@ -593,6 +597,7 @@
else:
yield self, key, self._nodes[key]
return
+ nodes_by_key = self._get_nodes_by_key()
for key in keys:
# sanity check
if key[0] is None:
@@ -600,7 +605,7 @@
if len(key) != self._key_length:
raise errors.BadIndexKey(key)
# find what it refers to:
- key_dict = self._nodes_by_key
+ key_dict = nodes_by_key
elements = list(key)
# find the subdict whose contents should be returned.
try:
@@ -973,7 +978,7 @@
raise errors.BadIndexData(self)
# keys are tuples. Each element is a string that may occur many
# times, so we intern them to save space. AB, RC, 200807
- key = tuple(intern(element) for element in elements[:self._key_length])
+ key = tuple([intern(element) for element in elements[:self._key_length]])
if first_key is None:
first_key = key
absent, references, value = elements[-3:]
=== modified file 'bzrlib/log.py'
--- a/bzrlib/log.py 2008-09-11 17:13:46 +0000
+++ b/bzrlib/log.py 2008-09-22 20:09:07 +0000
@@ -268,8 +268,8 @@
if specific_fileid:
view_revisions = _filter_revisions_touching_file_id(branch,
specific_fileid,
- mainline_revs,
- view_revisions)
+ view_revisions,
+ direction)
# rebase merge_depth - unless there are no revisions or
# either the first or last revision have merge_depth = 0.
@@ -536,75 +536,88 @@
return view_revisions
-def _filter_revisions_touching_file_id(branch, file_id, mainline_revisions,
- view_revs_iter):
- """Return the list of revision ids which touch a given file id.
+def _filter_revisions_touching_file_id(branch, file_id, view_revisions,
+ direction):
+ r"""Return the list of revision ids which touch a given file id.
The function filters view_revisions and returns a subset.
This includes the revisions which directly change the file id,
and the revisions which merge these changes. So if the
revision graph is::
- A
- |\
- B C
+ A-.
+ |\ \
+ B C E
+ |/ /
+ D |
+ |\|
+ | F
|/
- D
-
- And 'C' changes a file, then both C and D will be returned.
-
- This will also can be restricted based on a subset of the mainline.
-
+ G
+
+ And 'C' changes a file, then both C and D will be returned. F will not be
+ returned even though it brings the changes to C into the branch starting
+ with E. (Note that if we were using F as the tip instead of G, then we
+ would see C, D, F.)
+
+ This will also be restricted based on a subset of the mainline.
+
+ :param branch: The branch where we can get text revision information.
+ :param file_id: Filter out revisions that do not touch file_id.
+ :param view_revisions: A list of (revision_id, dotted_revno, merge_depth)
+ tuples. This is the list of revisions which will be filtered. It is
+ assumed that view_revisions is in merge_sort order (either forward or
+ reverse).
+ :param direction: The direction of view_revisions. See also
+ reverse_by_depth, and get_view_revisions
:return: A list of (revision_id, dotted_revno, merge_depth) tuples.
"""
- # find all the revisions that change the specific file
- # build the ancestry of each revision in the graph
- # - only listing the ancestors that change the specific file.
- graph = branch.repository.get_graph()
- # This asks for all mainline revisions, which means we only have to spider
- # sideways, rather than depth history. That said, its still size-of-history
- # and should be addressed.
- # mainline_revisions always includes an extra revision at the beginning, so
- # don't request it.
- parent_map = dict(((key, value) for key, value in
- graph.iter_ancestry(mainline_revisions[1:]) if value is not None))
- sorted_rev_list = tsort.topo_sort(parent_map.items())
- text_keys = [(file_id, rev_id) for rev_id in sorted_rev_list]
- modified_text_versions = branch.repository.texts.get_parent_map(text_keys)
- ancestry = {}
- for rev in sorted_rev_list:
- text_key = (file_id, rev)
- parents = parent_map[rev]
- if text_key not in modified_text_versions and len(parents) == 1:
- # We will not be adding anything new, so just use a reference to
- # the parent ancestry.
- rev_ancestry = ancestry[parents[0]]
+ # Lookup all possible text keys to determine which ones actually modified
+ # the file.
+ text_keys = [(file_id, rev_id) for rev_id, revno, depth in view_revisions]
+ # Looking up keys in batches of 1000 can cut the time in half, as well as
+ # memory consumption. GraphIndex *does* like to look for a few keys in
+ # parallel, it just doesn't like looking for *lots* of keys in parallel.
+ # TODO: This code needs to be re-evaluated periodically as we tune the
+ # indexing layer. We might consider passing in hints as to the known
+ # access pattern (sparse/clustered, high success rate/low success
+ # rate). This particular access is clustered with a low success rate.
+ get_parent_map = branch.repository.texts.get_parent_map
+ modified_text_revisions = set()
+ chunk_size = 1000
+ for start in xrange(0, len(text_keys), chunk_size):
+ next_keys = text_keys[start:start + chunk_size]
+ # Only keep the revision_id portion of the key
+ modified_text_revisions.update(
+ [k[1] for k in get_parent_map(next_keys)])
+ del text_keys, next_keys
+
+ result = []
+ if direction == 'forward':
+ # TODO: The algorithm for finding 'merges' of file changes expects
+ # 'reverse' order (the default from 'merge_sort()'). Instead of
+ # forcing this, we could just use the reverse_by_depth order.
+ view_revisions = reverse_by_depth(view_revisions)
+ # Track what revisions will merge the current revision, replace entries
+ # with 'None' when they have been added to result
+ current_merge_stack = [None]
+ for info in view_revisions:
+ rev_id, revno, depth = info
+ if depth == len(current_merge_stack):
+ current_merge_stack.append(info)
else:
- rev_ancestry = set()
- if text_key in modified_text_versions:
- rev_ancestry.add(rev)
- for parent in parents:
- if parent not in ancestry:
- # parent is a Ghost, which won't be present in
- # sorted_rev_list, but we may access it later, so create an
- # empty node for it
- ancestry[parent] = set()
- rev_ancestry = rev_ancestry.union(ancestry[parent])
- ancestry[rev] = rev_ancestry
-
- def is_merging_rev(r):
- parents = parent_map[r]
- if len(parents) > 1:
- leftparent = parents[0]
- for rightparent in parents[1:]:
- if not ancestry[leftparent].issuperset(
- ancestry[rightparent]):
- return True
- return False
-
- # filter from the view the revisions that did not change or merge
- # the specific file
- return [(r, n, d) for r, n, d in view_revs_iter
- if (file_id, r) in modified_text_versions or is_merging_rev(r)]
+ del current_merge_stack[depth + 1:]
+ current_merge_stack[-1] = info
+
+ if rev_id in modified_text_revisions:
+ # This needs to be logged, along with the extra revisions
+ for idx in xrange(len(current_merge_stack)):
+ node = current_merge_stack[idx]
+ if node is not None:
+ result.append(node)
+ current_merge_stack[idx] = None
+ if direction == 'forward':
+ result = reverse_by_depth(result)
+ return result
def get_view_revisions(mainline_revs, rev_nos, branch, direction,
=== modified file 'bzrlib/tests/test_log.py'
--- a/bzrlib/tests/test_log.py 2008-08-21 04:12:02 +0000
+++ b/bzrlib/tests/test_log.py 2008-09-22 20:09:07 +0000
@@ -986,10 +986,10 @@
view_revs_iter = log.get_view_revisions(mainline, revnos, tree.branch,
'reverse', True)
actual_revs = log._filter_revisions_touching_file_id(
- tree.branch,
+ tree.branch,
file_id,
- mainline,
- list(view_revs_iter))
+ list(view_revs_iter),
+ 'reverse')
self.assertEqual(revisions, [r for r, revno, depth in actual_revs])
def test_file_id_f1(self):
More information about the bazaar-commits
mailing list