[MERGE] Fix one cause of poor commit performance with thousands of deleted paths
Martin Pool
mbp at canonical.com
Fri Oct 26 06:09:14 BST 2007
On 25 Oct 2007, Robert Collins <robertc at robertcollins.net> wrote:
bb:tweak
> @@ -703,7 +705,7 @@
>
> report_changes = self.reporter.is_verbose()
> deleted_ids = []
> - deleted_paths = set()
> + deleted_paths = {}
Can you please add a comment about what it contains, it's not simply a
map of deleted paths.
> # XXX: Note that entries may have the wrong kind because the entry does
> # not reflect the status on disk.
> work_inv = self.work_tree.inventory
> @@ -717,16 +719,36 @@
> if kind == 'directory':
> self._next_progress_entry()
> # Skip files that have been deleted from the working tree.
> - # The deleted files/directories are also recorded so they
> - # can be explicitly unversioned later. Note that when a
> - # filter of specific files is given, we must only skip/record
> - # deleted files matching that filter.
> - if is_inside_any(deleted_paths, path):
> - continue
> + # The deleted path ids are also recorded so they can be explicitly
> + # unversioned later.
> + if deleted_paths:
> + path_segments = splitpath(path)
> + deleted_dict = deleted_paths
> + for segment in path_segments:
> + deleted_dict = deleted_dict.get(segment, None)
> + if deleted_dict is None:
> + # We took a path not present in the dict.
> + break
> + if not deleted_dict:
> + # We've reached an empty child dir in the dict, so are now
> + # a sub-path.
> + break
Those two branches seem like they could be combined - with the comment
still distinguishing the different cases.
> + else:
> + deleted_dict = None
> + if deleted_dict is not None:
> + # the path has a deleted parent, do not add it.
> + continue
> content_summary = self.work_tree.path_content_summary(path)
> + # Note that when a filter of specific files is given, we must only
> + # skip/record deleted files matching that filter.
> if not specific_files or is_inside_any(specific_files, path):
> if content_summary[0] == 'missing':
> - deleted_paths.add(path)
> + if not deleted_paths:
> + # path won't have been split yet.
> + path_segments = splitpath(path)
I'd suggest you instead set path_segments=None at the top of the for loop,
and check for that here.
I agree it'd be better to split it out, but this method is still
acceptably simple with this patch.
More information about the bazaar
mailing list