[RFC] Refactor diffing

Aaron Bentley aaron.bentley at utoronto.ca
Thu Nov 22 14:53:27 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
>> 2. I'm not sure the tri-value return
>> (unchanged/changed/could-not-handle) makes sense.  We already know
>> whether the file is changed, and we wouldn't be diffing it if it was
>> unchanged.  So I think this could just be True for "could handle" and
>> False for "could not handle".
> 
> I was specifically thinking of file formats where the representation can
> change but the value might not.

Ah.  Okay, I can support that.

> It looks pretty clean to me. I think you should add docstrings to the
> classes and methods to capture intent about how its structured. Other
> than that the only thing missing is a mechanism for plugins to insert
> new differ's and it will be really cool and extensible.

> One thing that might make that easier to do is to require a standard
> factory signature for all Differ's - specifically I'm thinking of the
> label and path_encoding and internal_diff parameters, which currently
> are only on TextDiffer.

That seems like a limiting approach; I deliberately moved those
parameters out of the rest of the code, because they only apply to a
particular diff output.

- - Forcing all other FileDiffers to accept parameters that are only
  relevant to unified diffs seems ugly
- - KindChangeDiffer needs totally different parameters
  (i.e. the differ list)
- - Other differ types might want other parameters.

So I'm thinking of something similar to, but not the same as, your idea.

Here are the goals I'm trying to satisfy:

1. Support "diff --using colordiff": allow the caller to supply one-time
instances, such that they take precedence over other file differs.
(These can take any desired parameters in their constructor)

2. Support "diff --diff-options -p": allow the caller to supply a text
differ that does not take precedence over other file differs.

3. Support per-type plugins:  allow plugins to register differs for
particular file types, such that they are automatically used by
TreeDiffer, and take precedence over the text differ, but not other
supplied differs.  (These must be constructible using a standardized
signature)

Here's how I'd do it:

TreeDiffer.__init__(self, old_tree_new_tree, text_differ,
                    extra_differs=None):
    self.differs = []
    if extra_differs is not None:
        self.differs.extend(extra_differs)
    for differ_factory in TreeDiffer._differ_factories:
        self.differs.append(differ_factory(old_tree, new_tree, to_file,
                            path_encoding)
    self.differs.extend(text_differ, KindChangeDiffer(self.differs))

Some additional questions:

Is the current grouping of files by change type (added, deleted,
modified, renamed...) desirable?   Personally, I'd rather see it sorted
in alphabetical order.  That would also allow us to use _iter_changes
directly, and reduce the number of similar loops in our code.

If we switch to using _iter_changes, what about using ChangeReporter?
We could still support the same output, I believe, plus we could also
support short-status output.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHRZfn0F+nu1YWqI0RAmkEAJ9Zm02RSp0ZPfkPu4V2NfkPIJO5YACeMQAv
O8yyKWthdtgmi54f8Psf6UQ=
=lHKx
-----END PGP SIGNATURE-----



More information about the bazaar mailing list