Making diff fast (was Re: Some notes on distributed SCM)

Martin Pool mbp at sourcefrog.net
Mon Apr 11 00:32:40 BST 2005


On Sun, 2005-04-10 at 16:22 +1000, Benno wrote:
> On Sat Apr 09, 2005 at 21:28:22 -0700, Matt Mackall wrote:
> >(Martin and Daniel, this is slightly updated)
> >  commit                 O(changed files)
> >  revert                 O(changed files)
> >  diff                   O(changed files)
>    status                 O(changed files)
> 
> Just for completenesss.
> 
> I think performing the above operations quickly is quite important.
> Currently this operation is quite slow:
> 

> i30pc60:/Users/benno/coding/bzr_linux_test% time ~/coding/bzr/revert_branch/bzr status
> ~/coding/bzr/revert_branch/bzr status  19.20s user 9.86s system 56% cpu 51.855 total

Yes, this is really awful and it is near the top of my list of things to
fix.  What I propose to do is put some stat information into the
working-inventory so that we do not need to read in and compute the
digest of all working files.  I had held off this because of the risk of
missing changes, but now it seems reasonable.

I think statting could be quick enough that going further is not needed.
This is a bit less work than an ls -lR of the working tee, which is
about 250ms with a hot cache, and 9s with a cold cache.  This requires
only holding the directory and inodes in memory which shouldn't use too
much vm or be thrown out too easily.

Linus suggested stating the files in order by inode number, which is a
new idea to me but makes sense (at least on some filesystems.)

> Some ways to improve this performance come to mind.
> 
> 1/ Caching the working tree stat data, and then being able
> to simply stat each file and compare the stat information.
> 
> Pros: Portable, simple to use.
> Cons: Still requies a full search of the tree which is slow.

Let's try this first and see how it works.

> 2/ Using dnotify or similar to have a notification of when files
> are changed, so that you explicitly know this information without a
> directory search.
> 
> Pros: Have available an exact set of files which are changed without
> doing a directory scan.
> Cons: Not so portable.

inotify would be better.  This could be a good thing to hack on on your
flight (if you can boot Linux on your laptop)

Bear in mind that the daemon and the inotify listener have some memory
overhead, which is memory that can't be used for keeping the tree in
cache.

> 3/ Create the working tree read only, and provide explicit
> commands eg: bzr edit, to modify a file.
> 
> Pros: portable, exact list of files changed.
> Cons: makes it harder for the user. Requires special tools for e.g: patch.

And some modifications may slip by.  

So as other people said, let's make #1 fast and then try the others.

-- 
Martin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050411/7aaa1fb2/attachment.pgp 


More information about the bazaar mailing list