Making diff fast (was Re: Some notes on distributed SCM)
Chris Mason
mason at suse.com
Mon Apr 11 02:38:37 BST 2005
On Sunday 10 April 2005 20:57, Martin Pool wrote:
> On Sun, 2005-04-10 at 20:36 -0400, Chris Mason wrote:
> > I seem to be in the minority of people who hate stats. I think <other
> > SCMs> are just so slow that people think stating the whole tree in
> > bzr/arch/git feels fast ;) This is why quilt feels so fast, it really is
> > O(change) for everything.
>
> Right, so you have to tell quilt when you modify something, or it gets
> confused. For people who want that we can certainly have such a mode,
> and if you hook it into your editor it shouldn't be a pain. I guess if
> you're going to apply a patch you can just "bzr edit .". Unlike bk I
> think the default should certainly be to give people editable files by
> default.
I definitely agree here, which is why my patch is only an optimization to the
case where the user gives a file list.
>
> Even better, I was thinking on Friday that perhaps there should be a
> command to directly load a patch. It's inefficient to walk the whole
> tree after patch to find out what happened when patch kindly gives us a
> list of modified files already.
>
Yes, a patch or changeset has enough info that we shouldn't need to stat the
entire tree when applying it. For applying a patch, I had an old script for
arch to parse the patch output and figure out which files were
changed/deleted/added. Let me know if you want it.
> To some extent I think it's a shortcoming of the kernel interface that
> stat should be so slow. Even with a cold cache, 12000 stats shouldn't
> be reading *that* many blocks from disk, and one would hope at least
> some of the inode data is contiguous. Maybe eventually we could get
> some kind of readdir-like call that also returns the stat information in
> one go...
My own trees are all hardlinked together and patched up from older kernel
revs, so a given directory listing is unlikely to have inodes in any kind of
seek friendly order. If you just unpack a big directory tree, the stats
trigger many fewer seeks.
A simple readdir flag that says "I'm going to stat each of these dir entries"
might help the FS be smarter....
-chris
More information about the bazaar
mailing list