Making diff fast (was Re: Some notes on distributed SCM)

Chris Mason mason at
Mon Apr 11 02:38:37 BST 2005

On Sunday 10 April 2005 20:57, Martin Pool wrote:
> On Sun, 2005-04-10 at 20:36 -0400, Chris Mason wrote:
> > I seem to be in the minority of people who hate stats.  I think <other
> > SCMs> are just so slow that people think stating the whole tree in
> > bzr/arch/git feels fast ;)  This is why quilt feels so fast, it really is
> > O(change) for everything.
> Right, so you have to tell quilt when you modify something, or it gets
> confused.  For people who want that we can certainly have such a mode,
> and if you hook it into your editor it shouldn't be a pain.  I guess if
> you're going to apply a patch you can just "bzr edit .".   Unlike bk I
> think the default should certainly be to give people editable files by
> default.

I definitely agree here, which is why my patch is only an optimization to the 
case where the user gives a file list.

> Even better, I was thinking on Friday that perhaps there should be a
> command to directly load a patch.  It's inefficient to walk the whole
> tree after patch to find out what happened when patch kindly gives us a
> list of modified files already.
Yes, a patch or changeset has enough info that we shouldn't need to stat the 
entire tree when applying it.  For applying a patch, I had an old script for 
arch to parse the patch output and figure out which files were 
changed/deleted/added.  Let me know if you want it.

> To some extent I think it's a shortcoming of the kernel interface that
> stat should be so slow.  Even with a cold cache, 12000 stats shouldn't
> be reading *that* many blocks from disk, and one would hope at least
> some of the inode data is contiguous.  Maybe eventually we could get
> some kind of readdir-like call that also returns the stat information in
> one go...

My own trees are all hardlinked together and patched up from older kernel 
revs, so a given directory listing is unlikely to have inodes in any kind of 
seek friendly order.  If you just unpack a big directory tree, the stats 
trigger many fewer seeks.

A simple readdir flag that says "I'm going to stat each of these dir entries" 
might help the FS be smarter....


More information about the bazaar mailing list