[MERGE] Implement guess-renames

Ali Sabil ali.sabil at gmail.com
Mon Mar 23 16:11:38 GMT 2009


Hi,

I am wondering what is the difference between this command, and the
command provided by the automv plugin ?

Cheers,

--
Ali

On Mon, Mar 23, 2009 at 5:02 PM, John Arbash Meinel
<john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Aaron Bentley wrote:
>> Hi all,
>>
>> This patch implements a guess-renames command.  Assume some other
>> process has renamed some of your files.  This will guess what renames
>> were performed and update bzr accordingly.
>>
>
> This seems like it would fit well as a plugin, though if you have reason
> to make this a core command, I certainly wouldn't refuse. (The only
> argument against is the standard "proliferation of commands" one.)
>
> +            for num, (file_id, contents) in enumerate(
> +                tree.iter_files_bytes(desired_files)):
> +                task.update('Calculating hashes', num, len(file_ids))
> +                s = StringIO()
> +                s.writelines(contents)
> +                s.seek(0)
> +                self.add_edge_hashes(s.readlines(), file_id)
>
> ^- I think you can use "osutils.chunks_to_lines()" to do this a lot more
> efficiently than writing into a StringIO and the splitting. (Assuming
> that contents is 'mostly lines' and you want to make sure that it is
> 'exactly lines'.)
>
> ...
>
> +                task.update('Determining hash hits', num, len(paths))
> +                my_file = self.tree.get_file(None, path=path)
> +                try:
> +                    hits = self.hitcounts(my_file.readlines())
> +                finally:
> +                    my_file.close()
>
> It seems more reasonable to use: self.tree.get_file_lines()
> or if that isn't officially a Tree api, then doing:
>
> my_file = self.tree.get_file(...)
> try:
>  my_lines = my_file.readlines()
> finally:
>  my_file.close()
> hits = self.hitcounts(my_lines)
>
> Mostly because it seems silly to keep the file open for the whole
> duration of "hitcounts()" rather than just long enough to read its content.
>
> ...
>
> +    @staticmethod
> +    def _match_hits(hit_list):
> +        """Using a hit list, determin a path-to-fileid map.
> ^- 'determine'
>
> +
> +        The hit list is a list of (count, path, file_id), where count is a
> +        (possibly float) number, with higher numbers indicating stronger
> +        matches.
> +        """
> +        seen_file_ids = set()
> +        seen_paths = set()
> +        path_map = {}
> +        for count, path, file_id in sorted(hit_list, reverse=True):
> +            if path in seen_paths or file_id in seen_file_ids:
> +                continue
> +            path_map[path] = file_id
> +            seen_paths.add(path)
> +            seen_file_ids.add(file_id)
> +        return path_map
>
>
> ^- You don't need to keep a "seen_paths" set, as you can just say:
>
> if path in path_map or file_id in seen_file_ids:
>
> Because a dict lookup is identical to a set lookup, and every path that
> you see gets put into the path_map. (This should save a reasonable
> amount of memory.)
>
> ...
>
> +    def guess_renames(klass, tree):
> +        """Guess which files to rename, and perform the rename.
> +
> +        We assume that unversioned files and missing files indicate that
> +        versioned files have been renamed outside of Bazaar.
> +        """
> +        required_parents = {}
> +        task = ui_factory.nested_progress_bar()
> +        try:
> +            pp = progress.ProgressPhase('Guessing renames', 4, task)
> +            basis = tree.basis_tree()
> +            basis.lock_read()
> +            try:
>
> ^- You don't lock "tree" in this function, but you do lock
> tree.basis_tree(). Are you assuming that "tree" is already locked? If
> so, you should probably add that statement in the docstring. (You need a
> ":param tree:" anyway.
>
> ...
>
> +    def _update_tree(self, required_parents, matches):
> +        self.tree.add(required_parents)
> +        reversed = dict((v, k) for k, v in matches.iteritems())
> +        child_to_parent = sorted(
> +            matches.values(), key=lambda x: reversed[x], reverse=True)
> +        self.tree.unversion(child_to_parent)
> +        paths_forward = sorted(matches.keys())
> +        file_ids_forward = [matches[p] for p in paths_forward]
> +        self.tree.add(paths_forward, file_ids_forward)
>
> ^- Doing this via 'unversion' followed by 'add' doesn't seem quite right
> versus doing it via "rename" operations. (As a possible problem, imagine
> you have a directory where some children are known, and some are not. If
> you then unversion() the directory, I think it would remove all known
> children, and you would only be adding back the previously unknown
> children.)
>
> I don't have a correctness understanding versus the rest of your
> changes, just a question about doing it this way.
>
>
> === added file 'bzrlib/tests/blackbox/test_guess_renames.py'
> - --- bzrlib/tests/blackbox/test_guess_renames.py       1970-01-01 00:00:00 +0000
> +++ bzrlib/tests/blackbox/test_guess_renames.py 2009-03-12 06:39:43 +0000
> @@ -0,0 +1,22 @@
> +# Copyright (C) 2008 Canonical Ltd
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> +
> +from  bzrlib import tests
> +
> +class TestGuessRenames(tests.TestCaseWithTransport):
> +
>
> ^- you have an extra <space> between from and bzrlib. And you are
> missing a vertical space before class.
>
>
>
> Overall, I think the functionality and algorithms are good. I'm not sure
> if it should be a core command or not...
>
>  BB:tweak
>
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAknHsrIACgkQJdeBCYSNAAOgYACg0CTklrAJwD3h2S/R/byeT8dg
> 5g8AnAtdjADoT2uWKKv92wCLu33W47+K
> =zyvl
> -----END PGP SIGNATURE-----
>
>



More information about the bazaar mailing list