[MERGE] Implement guess-renames
John Arbash Meinel
john at arbash-meinel.com
Mon Mar 23 16:02:58 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> Hi all,
>
> This patch implements a guess-renames command. Assume some other
> process has renamed some of your files. This will guess what renames
> were performed and update bzr accordingly.
>
This seems like it would fit well as a plugin, though if you have reason
to make this a core command, I certainly wouldn't refuse. (The only
argument against is the standard "proliferation of commands" one.)
+ for num, (file_id, contents) in enumerate(
+ tree.iter_files_bytes(desired_files)):
+ task.update('Calculating hashes', num, len(file_ids))
+ s = StringIO()
+ s.writelines(contents)
+ s.seek(0)
+ self.add_edge_hashes(s.readlines(), file_id)
^- I think you can use "osutils.chunks_to_lines()" to do this a lot more
efficiently than writing into a StringIO and the splitting. (Assuming
that contents is 'mostly lines' and you want to make sure that it is
'exactly lines'.)
...
+ task.update('Determining hash hits', num, len(paths))
+ my_file = self.tree.get_file(None, path=path)
+ try:
+ hits = self.hitcounts(my_file.readlines())
+ finally:
+ my_file.close()
It seems more reasonable to use: self.tree.get_file_lines()
or if that isn't officially a Tree api, then doing:
my_file = self.tree.get_file(...)
try:
my_lines = my_file.readlines()
finally:
my_file.close()
hits = self.hitcounts(my_lines)
Mostly because it seems silly to keep the file open for the whole
duration of "hitcounts()" rather than just long enough to read its content.
...
+ @staticmethod
+ def _match_hits(hit_list):
+ """Using a hit list, determin a path-to-fileid map.
^- 'determine'
+
+ The hit list is a list of (count, path, file_id), where count is a
+ (possibly float) number, with higher numbers indicating stronger
+ matches.
+ """
+ seen_file_ids = set()
+ seen_paths = set()
+ path_map = {}
+ for count, path, file_id in sorted(hit_list, reverse=True):
+ if path in seen_paths or file_id in seen_file_ids:
+ continue
+ path_map[path] = file_id
+ seen_paths.add(path)
+ seen_file_ids.add(file_id)
+ return path_map
^- You don't need to keep a "seen_paths" set, as you can just say:
if path in path_map or file_id in seen_file_ids:
Because a dict lookup is identical to a set lookup, and every path that
you see gets put into the path_map. (This should save a reasonable
amount of memory.)
...
+ def guess_renames(klass, tree):
+ """Guess which files to rename, and perform the rename.
+
+ We assume that unversioned files and missing files indicate that
+ versioned files have been renamed outside of Bazaar.
+ """
+ required_parents = {}
+ task = ui_factory.nested_progress_bar()
+ try:
+ pp = progress.ProgressPhase('Guessing renames', 4, task)
+ basis = tree.basis_tree()
+ basis.lock_read()
+ try:
^- You don't lock "tree" in this function, but you do lock
tree.basis_tree(). Are you assuming that "tree" is already locked? If
so, you should probably add that statement in the docstring. (You need a
":param tree:" anyway.
...
+ def _update_tree(self, required_parents, matches):
+ self.tree.add(required_parents)
+ reversed = dict((v, k) for k, v in matches.iteritems())
+ child_to_parent = sorted(
+ matches.values(), key=lambda x: reversed[x], reverse=True)
+ self.tree.unversion(child_to_parent)
+ paths_forward = sorted(matches.keys())
+ file_ids_forward = [matches[p] for p in paths_forward]
+ self.tree.add(paths_forward, file_ids_forward)
^- Doing this via 'unversion' followed by 'add' doesn't seem quite right
versus doing it via "rename" operations. (As a possible problem, imagine
you have a directory where some children are known, and some are not. If
you then unversion() the directory, I think it would remove all known
children, and you would only be adding back the previously unknown
children.)
I don't have a correctness understanding versus the rest of your
changes, just a question about doing it this way.
=== added file 'bzrlib/tests/blackbox/test_guess_renames.py'
- --- bzrlib/tests/blackbox/test_guess_renames.py 1970-01-01 00:00:00 +0000
+++ bzrlib/tests/blackbox/test_guess_renames.py 2009-03-12 06:39:43 +0000
@@ -0,0 +1,22 @@
+# Copyright (C) 2008 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+from bzrlib import tests
+
+class TestGuessRenames(tests.TestCaseWithTransport):
+
^- you have an extra <space> between from and bzrlib. And you are
missing a vertical space before class.
Overall, I think the functionality and algorithms are good. I'm not sure
if it should be a core command or not...
BB:tweak
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknHsrIACgkQJdeBCYSNAAOgYACg0CTklrAJwD3h2S/R/byeT8dg
5g8AnAtdjADoT2uWKKv92wCLu33W47+K
=zyvl
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list