Can Bazaar call external XML diff/patch programs?
John Arbash Meinel
john at arbash-meinel.com
Fri Sep 21 22:06:59 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
David Clymer wrote:
> Please remember to include the list in your replies ;o)
>
> On Tue, 2007-09-18 at 10:44 -0700, Rocky Kahn wrote:
>> Thanks...Darcs was only temporarily down and I've been exploring it
>> since it came back up.
>>
>> I think Stephen Ward's diff is for viewing changes between revs as
>> opposed to generating the diff that gets stored in the revision
>> history. I'm interested in calculating the invertable delta myself
>> and working with a version control system that will let me use these
>> deltas instead of its native, line-based diffs. The academics
>> referred to in original email chose darcs because it was easy to
>> modify. Might Bazaar be similarly easy? It seems that Bazaar is more
>> robustly supported than darcs (more developer postings, etc).
Our current storage format is strictly line based. You could probably plug in a
different algorithm, but we would still only track what lines have changed.
This is also done because our annotations are line-based.
We've talked about splitting out the information. We'd like to use something
like xdelta, because it gives smaller diffs, but then we need somewhere to
store line-based annotations.
>>
>> Here's how more "atomic" diffs/patches can avoid unnecessary
>> conflicts:
>> Let's say the document is XML and each paragraph has its own
>> (sometimes very long) line in the file. Two users modify the same
>> paragraph in parallel. Since most version control systems use
>> line-based diff, these two changes will register as a conflict...even
>> though the changes don't overlap and therefore could be merged.
>>
>> original line: Elephants are white. Cereal is a grain.
>> user A's change: Elephants are grey. Cereal is a grain.
>> user B's change: Elephants are white. Cereal is for breakfast.
>> merged line: Elephants are grey. Cereal is for breakfast.
>>
>> If instead I calculate a character-based diff, conflicts would occur
>> less frequently. This is all assuming that Bazaar uses only the
>> standard, line-based diff...does it? Is there anyone in particular
>> who would know where the "hooks" in the code are to change out the
>> diff/patch routines?
True, but remember, the point is not to get "no conflicts". The goal of merge
is to get a result which is as close to what a human being would do. Such that
the final result will pass the test suite, be meaningful, etc.
For most source code, concepts are encapsulated per line. (Almost all style
guides say you should do one thing per line, and stuff like:
foo(); bar(); baz()
should be avoided.)
So in all those cases, while you could have
foo(bar=baz, biz=bing)
changed to
foo(bar=new_var, biz=bing)
and
foo(bar=baz, biz=another_var)
get merged to:
foo(bar=new_var, biz=another_var)
It is pretty common that the one change needs to know about the other.
Actually, one major problem with a lot of merge algorithms is "accidentally
clean" merge, which is actually worse than a spurious conflict. Because then
(for example) the user isn't told that their fix for checking superuser is
removed, and now any user can corrupt their system.
>
> I'm sure there are many here that can answer this better than I.
> However, if you want to modify the sequence matching behavior of bazaar,
> you could probably modify or replace
> bzrlib.patiencediff.PatienceSequenceMatcher
>
> I don't know how hard it would be to provide the functionality you want
> while keeping compatibility with the PatienceSquenceMatcher interface
> though. If you have to break away from that, it could be fairly
> complex.
>
> IANA bazaar expert.
>
> -davidc
>
PatienceSequenceMatcher provides the same interface as difflib.SequenceMatcher.
Which really only needs to know what regions of text match. However, it is
line-based, and not much you can do to change that.
What we would *like* is to have the internal diffs and annotations be
'hunk-based' where hunks may be multi-line, or sub-line. But at the moment, we
are pretty focussed on line-based.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG9DJzJdeBCYSNAAMRAqnvAKCGJ6PECLq2hCcU/hl+JZ9rhLSVvQCgkt2w
ZyX4xunYekLnEuStEzJqA70=
=fGsH
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list