Plugable diffing algorithms?

John Arbash Meinel john at arbash-meinel.com
Thu Dec 14 21:07:09 GMT 2006


Aaron Bentley wrote:
> Nicholas Allen wrote:
>> Hi,
> 
>> This is just more of an idea than anything (and clearly not a near term
>> feature) but it would be great if bzr had pluggable and intelligent
>> diffing/merging algorithms for various types of files.
> 
> Merge algorithms are reasonably pluggable already, since we support 3
> out of the box.  We don't distinguish based on filetype, but I doubt it
> would be hard to implement a merge algorithm that *did*.
> 
> Note that there's already a plugin that that lets you invoke arbitrary
> merge algorithms: extmerge
> 
> http://erik.bagfors.nu/bzr-plugins/extmerge
> 
> Diffs are not currently as pluggable in the infrastructure, but there's
> already a plugin to use whatever differ you like: difftools
> 
> http://mysite.verizon.net/sward.dev/projects/bzr_difftools
> 
>> As docbook files are often stored along
>> with source code it would be great if bzr could diff and merge this
>> content in a more intelligent way to avoid conflicts.
> 
> I agree, but I'm skeptical that an XML merge would do a good job.  I
> think you'd need a DocBook-specific merge.
> 
> Aaron

Further, there is a small limitation in our storage format, which is
that it stores either full texts, or line deltas. So the diff algorithm
would still need to emit one of these.

As Aaron mentioned, what you really want is an intelligent merge
algorithm. And for XML there are 2 levels that are needed. Syntactic and
semantic meaning. A generic XML merge algorithm could probably handle
some level of syntax, such as knowing to never create:

<b>
<i>
</b>
</i>

Since those violate the general nesting rule (it is really easy to
create that scenario with plain diff, and not even have it conflict).

An even smarter algorithm might understand a DTD, and try to make sure
the basic syntax isn't violated, or at least conflict if it did.

But really, without that second level, you aren't much better than plain
diff3.

One nice thing about bzr, is when it comes to merging/updating, it
generally always looks at full texts. So while you might try something
fancier for diff, it only changes the storage. At merge time, you get 3
full texts to compare.

John
=:->




More information about the bazaar mailing list