[MERGE] Implement guess-renames
Aaron Bentley
aaron at aaronbentley.com
Mon Mar 23 15:05:00 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
This patch implements a guess-renames command. Assume some other
process has renamed some of your files. This will guess what renames
were performed and update bzr accordingly.
The main use-case for this is when importing from tarballs, but renames
from mv, nautilus, or random other utilities are also supported. It is
completely acceptable to modify the content of the renamed file, so long
as it retains a significant resemblance to the last-committed version.
The algorithm for files uses edges (sequential pairs of lines) for
matching, so that single matching lines do not trigger a match--
matching *sequences* of lines are required.
There is no handling of symlinks. They are ignored.
The algorithm for files is:
1. For each missing file, hash every edge, and associate the hash value
with the file-id
2. For each unknown file (or child of an unknown directory), hash every
edge, and find out which missing files also had that edge. Assign a
score to every file that matched, where each hit increases the score,
but hits are weighted according to how many files contain them. (This
way, the edges in the standard GPL preamble get a low weighting).
3. Generate an overall list of scores, and generate a mapping starting
with the highest-scored items, and ending with the lowest-scored.
This algorithm does not work for empty files, because they have no
edges. They are simply ignored.
The algorithm for directories is similar in form, except that instead of
hits being on edges, they are based on file-ids. The directory version
recurses so that directories whose sole content is another directory are
correctly handled.
This command was 100% correct when guessing renames for the Launchpad
source tree. To test, I first reverted to 300 revisions ago so that the
contents were not an exact match. I then renamed all py files. This is
a torture test-- it is rare for that many files to be renamed, and there
are ~1600 python files in the LP tree. Although 100% of its guesses
were correct, it did not guess empty __init__.py files, which is a bit
of a shame.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknHpRkACgkQ0F+nu1YWqI37fwCaAzERFsBq+RgkGiEiOpEMyY+x
crEAnA9p28W/mpMfrQ6qYO6IjVggY7VY
=Psy5
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: guess-renames-3218.patch
Type: text/x-patch
Size: 44946 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090323/453188ff/attachment-0001.bin
More information about the bazaar
mailing list