Longest Common Subsequences code

John Arbash Meinel john at arbash-meinel.com
Thu Nov 2 15:03:26 GMT 2006


Cheuksan Edward Wang wrote:
> 
> 
> On 11/2/06, *John Arbash Meinel* <john at arbash-meinel.com
> <mailto:john at arbash-meinel.com>> wrote:
> 
> 
>     1) Delta compression of full-texts. Minimal diffs would make this
>     better, but I would guess only marginally so. ATM we are gzip
>     compressing the final hunks anyway. The most important thing is that a
>     change of 2 lines to a 100Kline source file should be ~2-lines, not
>     100Klines. And I think all algorithms will give us that.
> 
> 
> Unfortunately, this problem can theoretically happen with patience diff.
> If people do run into it, we might need to use something else.
> 
> Cheuksan Edward Wang

The scenario I can come up with is a 100K line file, where all the lines
are duplicated somewhere, and you change the first and last line. Then
what should have been a 2-line change would indeed be much too long.

I have the feeling patience-diff could be updated to avoid that sort of
pathological behavior without needing a major overhaul. But for now, I'm
content with what we have.

If you have good feeling about doing things differently, I'll certainly
listen.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061102/77461043/attachment.pgp 


More information about the bazaar mailing list