[RFC] Refactor diffing

Robert Collins robertc at robertcollins.net
Thu Nov 22 08:54:40 GMT 2007


On Wed, 2007-11-21 at 23:47 -0500, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Here's an updated version.  A couple of things:
> 1. several parts of Bazaar treat binary file handling as part of the
> text diff protocol, so having a separate BinaryFileDiffer breaks
> compatibility, and doesn't seem like a great improvement.

Uhm, I was mainly thinking here about clarity of code structure; if its
better rolled into one, so be it. You're cutting the code :).

> 2. I'm not sure the tri-value return
> (unchanged/changed/could-not-handle) makes sense.  We already know
> whether the file is changed, and we wouldn't be diffing it if it was
> unchanged.  So I think this could just be True for "could handle" and
> False for "could not handle".

I was specifically thinking of file formats where the representation can
change but the value might not. For instance two open office documents,
which are gzipped, and thus have a datestamp in the gzip header but may
have the same decompressed contents.

> Thoughts?

It looks pretty clean to me. I think you should add docstrings to the
classes and methods to capture intent about how its structured. Other
than that the only thing missing is a mechanism for plugins to insert
new differ's and it will be really cool and extensible.

One thing that might make that easier to do is to require a standard
factory signature for all Differ's - specifically I'm thinking of the
label and path_encoding and internal_diff parameters, which currently
are only on TextDiffer.

So I'm thinking TreeDiffer.__init__ has something like:
differs = []
for differ_factory in TreeDiffer._differ_factories:
    differs.append(differ_factory(old_tree, new_tree, old_label,
        new_label, path_encoding, to_file, internal_diff)
differs.append(KindChangeDiff(differs))

and _differ_factories is statically initialised as
[SymlinkDiff, TextDiff]

Cheers,
Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071122/74d2011f/attachment.pgp 


More information about the bazaar mailing list