Rev 5847: (jameinel) Make 'bzr revert' much faster in large trees, in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue May 10 10:28:32 UTC 2011


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 5847 [merge]
revision-id: pqm at pqm.ubuntu.com-20110510102823-vf4qlngmjhgg6538
parent: pqm at pqm.ubuntu.com-20110510093033-8u89g79mvfozt4wl
parent: john at arbash-meinel.com-20110510093435-4mqysvnrs9e9s226
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2011-05-10 10:28:23 +0000
message:
  (jameinel) Make 'bzr revert' much faster in large trees,
   bug #759096 (John A Meinel)
modified:
  bzrlib/transform.py            transform.py-20060105172343-dd99e54394d91687
  doc/en/release-notes/bzr-2.4.txt bzr2.4.txt-20110114053217-k7ym9jfz243fddjm-1
  doc/en/whats-new/whats-new-in-2.4.txt whatsnewin2.4.txt-20110114044330-nipk1og7j729fy89-1
=== modified file 'bzrlib/transform.py'
--- a/bzrlib/transform.py	2011-04-29 23:11:10 +0000
+++ b/bzrlib/transform.py	2011-05-10 09:34:35 +0000
@@ -2857,7 +2857,11 @@
                  backups, merge_modified, basis_tree=None):
     if basis_tree is not None:
         basis_tree.lock_read()
-    change_list = target_tree.iter_changes(working_tree,
+    # We ask the working_tree for its changes relative to the target, rather
+    # than the target changes relative to the working tree. Because WT4 has an
+    # optimizer to compare itself to a target, but no optimizer for the
+    # reverse.
+    change_list = working_tree.iter_changes(target_tree,
         specific_files=specific_files, pb=pb)
     if target_tree.get_root_id() is None:
         skip_root = True
@@ -2867,13 +2871,19 @@
         deferred_files = []
         for id_num, (file_id, path, changed_content, versioned, parent, name,
                 kind, executable) in enumerate(change_list):
-            if skip_root and file_id[0] is not None and parent[0] is None:
+            target_path, wt_path = path
+            target_versioned, wt_versioned = versioned
+            target_parent, wt_parent = parent
+            target_name, wt_name = name
+            target_kind, wt_kind = kind
+            target_executable, wt_executable = executable
+            if skip_root and wt_parent is None:
                 continue
             trans_id = tt.trans_id_file_id(file_id)
             mode_id = None
             if changed_content:
                 keep_content = False
-                if kind[0] == 'file' and (backups or kind[1] is None):
+                if wt_kind == 'file' and (backups or target_kind is None):
                     wt_sha1 = working_tree.get_file_sha1(file_id)
                     if merge_modified.get(file_id) != wt_sha1:
                         # acquire the basis tree lazily to prevent the
@@ -2885,34 +2895,34 @@
                         if file_id in basis_tree:
                             if wt_sha1 != basis_tree.get_file_sha1(file_id):
                                 keep_content = True
-                        elif kind[1] is None and not versioned[1]:
+                        elif target_kind is None and not target_versioned:
                             keep_content = True
-                if kind[0] is not None:
+                if wt_kind is not None:
                     if not keep_content:
                         tt.delete_contents(trans_id)
-                    elif kind[1] is not None:
-                        parent_trans_id = tt.trans_id_file_id(parent[0])
+                    elif target_kind is not None:
+                        parent_trans_id = tt.trans_id_file_id(wt_parent)
                         backup_name = tt._available_backup_name(
-                            name[0], parent_trans_id)
+                            wt_name, parent_trans_id)
                         tt.adjust_path(backup_name, parent_trans_id, trans_id)
-                        new_trans_id = tt.create_path(name[0], parent_trans_id)
-                        if versioned == (True, True):
+                        new_trans_id = tt.create_path(wt_name, parent_trans_id)
+                        if wt_versioned and target_versioned:
                             tt.unversion_file(trans_id)
                             tt.version_file(file_id, new_trans_id)
                         # New contents should have the same unix perms as old
                         # contents
                         mode_id = trans_id
                         trans_id = new_trans_id
-                if kind[1] in ('directory', 'tree-reference'):
+                if target_kind in ('directory', 'tree-reference'):
                     tt.create_directory(trans_id)
-                    if kind[1] == 'tree-reference':
+                    if target_kind == 'tree-reference':
                         revision = target_tree.get_reference_revision(file_id,
-                                                                      path[1])
+                                                                      target_path)
                         tt.set_tree_reference(revision, trans_id)
-                elif kind[1] == 'symlink':
+                elif target_kind == 'symlink':
                     tt.create_symlink(target_tree.get_symlink_target(file_id),
                                       trans_id)
-                elif kind[1] == 'file':
+                elif target_kind == 'file':
                     deferred_files.append((file_id, (trans_id, mode_id)))
                     if basis_tree is None:
                         basis_tree = working_tree.basis_tree()
@@ -2926,26 +2936,26 @@
                         merge_modified[file_id] = new_sha1
 
                     # preserve the execute bit when backing up
-                    if keep_content and executable[0] == executable[1]:
-                        tt.set_executability(executable[1], trans_id)
-                elif kind[1] is not None:
-                    raise AssertionError(kind[1])
-            if versioned == (False, True):
+                    if keep_content and wt_executable == target_executable:
+                        tt.set_executability(target_executable, trans_id)
+                elif target_kind is not None:
+                    raise AssertionError(target_kind)
+            if not wt_versioned and target_versioned:
                 tt.version_file(file_id, trans_id)
-            if versioned == (True, False):
+            if wt_versioned and not target_versioned:
                 tt.unversion_file(trans_id)
-            if (name[1] is not None and
-                (name[0] != name[1] or parent[0] != parent[1])):
-                if name[1] == '' and parent[1] is None:
+            if (target_name is not None and
+                (wt_name != target_name or wt_parent != target_parent)):
+                if target_name == '' and target_parent is None:
                     parent_trans = ROOT_PARENT
                 else:
-                    parent_trans = tt.trans_id_file_id(parent[1])
-                if parent[0] is None and versioned[0]:
-                    tt.adjust_root_path(name[1], parent_trans)
+                    parent_trans = tt.trans_id_file_id(target_parent)
+                if wt_parent is None and wt_versioned:
+                    tt.adjust_root_path(target_name, parent_trans)
                 else:
-                    tt.adjust_path(name[1], parent_trans, trans_id)
-            if executable[0] != executable[1] and kind[1] == "file":
-                tt.set_executability(executable[1], trans_id)
+                    tt.adjust_path(target_name, parent_trans, trans_id)
+            if wt_executable != target_executable and target_kind == "file":
+                tt.set_executability(target_executable, trans_id)
         if working_tree.supports_content_filtering():
             for index, ((trans_id, mode_id), bytes) in enumerate(
                 target_tree.iter_files_bytes(deferred_files)):

=== modified file 'doc/en/release-notes/bzr-2.4.txt'
--- a/doc/en/release-notes/bzr-2.4.txt	2011-05-10 09:30:33 +0000
+++ b/doc/en/release-notes/bzr-2.4.txt	2011-05-10 09:34:35 +0000
@@ -136,8 +136,13 @@
    or memory usage, or better results.
 
 * ``bzr merge`` in large trees is now significantly faster. On a 70k entry
-  tree, the time went from ~3min down to 30s.
-  (John Arbash Meinel, #759091)
+  tree, the time went from ~3min down to 30s. This also effects ``bzr pull``
+  and ``bzr update`` since they use the same merge logic to update the
+  WorkingTree.  (John Arbash Meinel, #759091)
+
+* ``bzr revert`` now properly uses ``bzr status``'s optimized
+  ``iter_changes``. This can be a significant performance difference (33s
+  to 5s on large trees). (John Arbash Meinel, #759096)
 
 * Resolve ``lp:FOO`` urls locally rather than doing an XMLRPC request if
   the user has done ``bzr launchpad-login``. The bzr+ssh URLs were already
@@ -198,6 +203,12 @@
   this might be required for "installing" extra dependencies for some plugins.
   (Alexander Belchenko, #743256)
 
+* ``transform.revert()`` has been updated to use
+  ``wt.iter_changes(basis_tree)`` rather than
+  ``basis_tree.iter_changes(wt)``. This allows the optimized code path to
+  kick in, improving ``bzr revert`` times significantly (33s to 4s on
+  large trees, 0.7s to 0.3s on small trees.) (John Arbash Meinel, #759096)
+
 * ``TreeTransform.create_file/new_file`` can now take an optional ``sha1``
   parameter. If supplied, when the transform is applied, it will then call
   ``self._tree._observed_sha1`` for those files. This lets us update the

=== modified file 'doc/en/whats-new/whats-new-in-2.4.txt'
--- a/doc/en/whats-new/whats-new-in-2.4.txt	2011-05-09 02:13:40 +0000
+++ b/doc/en/whats-new/whats-new-in-2.4.txt	2011-05-10 09:34:35 +0000
@@ -54,6 +54,22 @@
 format.  Refer to ``bzr help changelog_merge`` for documentation on how to
 enable it and what it can do.
 
+Faster operations on Large Trees
+********************************
+
+Many bzr commands used to run into pathological behavior on large trees
+(>10k files), reading the inventory data in random order causing cache
+thrashing. Various other tweaks have been applied with feedback from large
+trees. A possibly incomplete list is as follows for running commands on a
+70k file tree::
+
+    bzr-2.3 bzr-2.4 action
+    3m39s   1m03s   bzr co --lightweight
+      38s      6s   bzr revert
+    4m47s     27s   bzr merge
+    4m58s     32s   bzr up
+    
+
 Faster stacked branches
 ***********************
 




More information about the bazaar-commits mailing list