Making diff fast (was Re: Some notes on distributed SCM)

Aaron Bentley aaron.bentley at utoronto.ca
Sun Apr 10 23:17:20 BST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Benno wrote:
> 1/ Caching the working tree stat data, and then being able
> to simply stat each file and compare the stat information.
> 
> Pros: Portable, simple to use.
> Cons: Still requies a full search of the tree which is slow.

This is what Arch does, and it's quite slow on large trees.  Robert
Collins has recently improved this in Baz, but it doesn't change the
fact that it's an O(versioned files) operation, rather than O(changed
files).

> 2/ Using dnotify or similar to have a notification of when files
> are changed, so that you explicitly know this information without a
> directory search.
> 
> Pros: Have available an exact set of files which are changed without
> doing a directory scan.
> Cons: Not so portable.

inotify is probably more suitable, and I've actually discussed this a
bit with Robert Love.

> 3/ Create the working tree read only, and provide explicit
> commands eg: bzr edit, to modify a file.

The notion of 'consider all files unchanged until otherwise notified' is
slightly broader, and with it, you can implement 2 on top of 3.

> I propose being able to have multiple different workding directory formats
> so the user can make the choice, probable with 1/ as the sane default.

I think you don't need an extra format to support 1 and 3.  Just a
per-tree configuration option, and an extra file for 3.

> This means having some file indicating the working directory. I think
> it might make sense to have a .bzr_wd, to store data about the working
> direcotry and leave .bzr as things for the repository itself. So .bzr_wd
> would store for example a stat cache for 1/ or a list of modified files
> in 3/, also you would probably move the "inventory" file from .bzr to
> .bzr_wd.

I dunno about that.  bzr is deliberately trying to avoid having both
'repository' and 'working directory' concepts.  The one is the other, so
working tree data seems suitable for .bzr (and is already in there, anyway).

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCWaXw0F+nu1YWqI0RAiHRAJ4se1q7Uas2ziaqxSBt5Wqq1txUkgCfZxkO
zHEL46IoCuSoMtq36bgQHBE=
=Joxn
-----END PGP SIGNATURE-----




More information about the bazaar mailing list