[MERGE] Implement guess-renames
Ali Sabil
ali.sabil at gmail.com
Mon Mar 23 16:11:38 GMT 2009
Hi,
I am wondering what is the difference between this command, and the
command provided by the automv plugin ?
Cheers,
--
Ali
On Mon, Mar 23, 2009 at 5:02 PM, John Arbash Meinel
<john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Aaron Bentley wrote:
>> Hi all,
>>
>> This patch implements a guess-renames command. Assume some other
>> process has renamed some of your files. This will guess what renames
>> were performed and update bzr accordingly.
>>
>
> This seems like it would fit well as a plugin, though if you have reason
> to make this a core command, I certainly wouldn't refuse. (The only
> argument against is the standard "proliferation of commands" one.)
>
> + for num, (file_id, contents) in enumerate(
> + tree.iter_files_bytes(desired_files)):
> + task.update('Calculating hashes', num, len(file_ids))
> + s = StringIO()
> + s.writelines(contents)
> + s.seek(0)
> + self.add_edge_hashes(s.readlines(), file_id)
>
> ^- I think you can use "osutils.chunks_to_lines()" to do this a lot more
> efficiently than writing into a StringIO and the splitting. (Assuming
> that contents is 'mostly lines' and you want to make sure that it is
> 'exactly lines'.)
>
> ...
>
> + task.update('Determining hash hits', num, len(paths))
> + my_file = self.tree.get_file(None, path=path)
> + try:
> + hits = self.hitcounts(my_file.readlines())
> + finally:
> + my_file.close()
>
> It seems more reasonable to use: self.tree.get_file_lines()
> or if that isn't officially a Tree api, then doing:
>
> my_file = self.tree.get_file(...)
> try:
> my_lines = my_file.readlines()
> finally:
> my_file.close()
> hits = self.hitcounts(my_lines)
>
> Mostly because it seems silly to keep the file open for the whole
> duration of "hitcounts()" rather than just long enough to read its content.
>
> ...
>
> + @staticmethod
> + def _match_hits(hit_list):
> + """Using a hit list, determin a path-to-fileid map.
> ^- 'determine'
>
> +
> + The hit list is a list of (count, path, file_id), where count is a
> + (possibly float) number, with higher numbers indicating stronger
> + matches.
> + """
> + seen_file_ids = set()
> + seen_paths = set()
> + path_map = {}
> + for count, path, file_id in sorted(hit_list, reverse=True):
> + if path in seen_paths or file_id in seen_file_ids:
> + continue
> + path_map[path] = file_id
> + seen_paths.add(path)
> + seen_file_ids.add(file_id)
> + return path_map
>
>
> ^- You don't need to keep a "seen_paths" set, as you can just say:
>
> if path in path_map or file_id in seen_file_ids:
>
> Because a dict lookup is identical to a set lookup, and every path that
> you see gets put into the path_map. (This should save a reasonable
> amount of memory.)
>
> ...
>
> + def guess_renames(klass, tree):
> + """Guess which files to rename, and perform the rename.
> +
> + We assume that unversioned files and missing files indicate that
> + versioned files have been renamed outside of Bazaar.
> + """
> + required_parents = {}
> + task = ui_factory.nested_progress_bar()
> + try:
> + pp = progress.ProgressPhase('Guessing renames', 4, task)
> + basis = tree.basis_tree()
> + basis.lock_read()
> + try:
>
> ^- You don't lock "tree" in this function, but you do lock
> tree.basis_tree(). Are you assuming that "tree" is already locked? If
> so, you should probably add that statement in the docstring. (You need a
> ":param tree:" anyway.
>
> ...
>
> + def _update_tree(self, required_parents, matches):
> + self.tree.add(required_parents)
> + reversed = dict((v, k) for k, v in matches.iteritems())
> + child_to_parent = sorted(
> + matches.values(), key=lambda x: reversed[x], reverse=True)
> + self.tree.unversion(child_to_parent)
> + paths_forward = sorted(matches.keys())
> + file_ids_forward = [matches[p] for p in paths_forward]
> + self.tree.add(paths_forward, file_ids_forward)
>
> ^- Doing this via 'unversion' followed by 'add' doesn't seem quite right
> versus doing it via "rename" operations. (As a possible problem, imagine
> you have a directory where some children are known, and some are not. If
> you then unversion() the directory, I think it would remove all known
> children, and you would only be adding back the previously unknown
> children.)
>
> I don't have a correctness understanding versus the rest of your
> changes, just a question about doing it this way.
>
>
> === added file 'bzrlib/tests/blackbox/test_guess_renames.py'
> - --- bzrlib/tests/blackbox/test_guess_renames.py 1970-01-01 00:00:00 +0000
> +++ bzrlib/tests/blackbox/test_guess_renames.py 2009-03-12 06:39:43 +0000
> @@ -0,0 +1,22 @@
> +# Copyright (C) 2008 Canonical Ltd
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> +
> +from bzrlib import tests
> +
> +class TestGuessRenames(tests.TestCaseWithTransport):
> +
>
> ^- you have an extra <space> between from and bzrlib. And you are
> missing a vertical space before class.
>
>
>
> Overall, I think the functionality and algorithms are good. I'm not sure
> if it should be a core command or not...
>
> BB:tweak
>
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAknHsrIACgkQJdeBCYSNAAOgYACg0CTklrAJwD3h2S/R/byeT8dg
> 5g8AnAtdjADoT2uWKKv92wCLu33W47+K
> =zyvl
> -----END PGP SIGNATURE-----
>
>
More information about the bazaar
mailing list