Diffing commits of big files is slow

Robert Collins robert.collins at canonical.com
Mon Jun 22 02:00:32 BST 2009


On Mon, 2009-06-22 at 10:50 +1000, Martin Pool wrote:
> 2009/6/22 Robert Collins <robert.collins at canonical.com>:
> > On Mon, 2009-06-22 at 01:51 +0200, Daniel Clemente wrote:
> >> In a pack-0.92 branch with latest Bazaar (1.17dev), I replaced a 40 Mb video with a newer version (30Mb).
> >>
> >>   I am intrigued as to why the following operations are so slow:
> >
> > bzr doesn't know that the files are binary until it extracts them and
> > examines the content. It knows they are different before extracting, but
> > that doesn't help a lot.
> 
> I guess it could shortcut this case by streaming the file out and
> looking at just the start of it.  If the workingcopy file is obviously
> binary it in theory shouldn't need to read the repository's copy at
> all.

Agreed with a caveat: I want to be able to hook in custom differs to
handle things including binary files (like oo documents which are zips).

> 1- matching a user rule saying *.ogg (or this file-id or whatever) is
> binary and shouldn't be diffed;
> <https://bugs.edge.launchpad.net/bzr/+bug/218128> -- commonly duped
> 
> 2- streaming extraction of the content and detecting from the first
> bit of it that it's binary and shouldn't be diffed
> <https://bugs.edge.launchpad.net/bzr/+bug/390418> -- would be nice

At least for our repository  format, we have all-or-nothing for
decompressing content; that said, if we do large file fragmenting as
I've proposed we can have better interfaces.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090622/6baad459/attachment-0001.pgp 


More information about the bazaar mailing list