Bazaar NG performance on large repositories

John Arbash Meinel john at arbash-meinel.com
Mon Nov 6 17:27:09 GMT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johan Rydberg wrote:
> John Arbash Meinel <john at arbash-meinel.com> writes:
> 
>>> Is that something that is hard to change? Would it not be possible to
>>> have a file somewhere (in .bzr) for each file under version control that
>>> lists the revisions that modified it? These files could be calculated
>>> once after an initial pull and then updated on each commit. Then a log
>>> for a single command would be very fast. The files could be updated the
>>> first time a log on a single file is done or always after pulling (but
>>> this would increase the pull time of course).
>> As I commented, most of this is already done. We don't record deletes in
>> that index, and we really should.
> 
> What is the plan for "bzr log DIR"?  
> 
> One possible solution would be to iterate through the children of the
> directory and collect all their revisions.
> 
> Another solution would be to record modifications in the
> directory-knit.
> 
> ~j
> 

There are a couple things we are thinking of.

1) We already record modifications to the directory in the directories
.knit file. However, the only way you can modify a directory is my
moving it (or renaming it, basically the same thing).

2) We want to start tracking the recursive set of changes. So if a file
or subdirectory changes, then the parent directory has a
'children_last_modified' flag which gets updated. This recurses to the
top, which means that the root entry will get a new entry for every
modification to the tree. This makes it cheap to do whole-tree
comparisons, because of the Binary Decision Tree tricks. (You can ignore
everything underneath any directory entry which has the same
children-last-modified flag).
We are thinking to use revision-ids rather than sha hashes, because they
contain a little bit more information. (they can give you information
about resolved merges, etc, which a sha hash cannot).

3) For 'bzr log DIR', I'm not sure what the 'best' thing is. I think it
would be good to show the recursive set of changes (so if DIR/foo or
DIR/subdir/foo was modified, then you get that log entry).

We could do that right now, but to be efficient we really need (2)
first. We need it for other things anyway, though.

4) In general we track files across renames for stuff like 'bzr log
file', rather than tracking a specific location. There are times when
you might want both, especially when you are doing something like 'bzr
log DIR'. Do you want to match any files that have been a subdirectory
of that dir, even across renames? Do you only want to match things that
have been in that dir (like if you had an older directory in that place,
moved stuff out, and moved other stuff in).

If you have files that move *into* that directory, do you want to
continue following them after they are no longer in that directory. What
about if they move into and then out of that directory. So they are no
longer in the directory anymore, but at one time they were. Do you need
to go backwards and find that they were there, and then include all of
their revisions as well (including going forwards again).


Anyway, there are probably some defaults that make a good start. But
potentially there are a *lot* of ways to slice the ancestry, and each
one probably has a use case. So we may just allow one or a couple ways
to do it from the command line, which make the most sense. But we can
try to make sure the meta-info is recorded so that we can do all the
ones that people would want to do.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFT3BtJdeBCYSNAAMRAhgyAKDXEEWWyd73Z1FLIj2OmzqrlHUnRgCfaRda
sV0zrMikcyNKQ5hAu6ftQAA=
=bXjc
-----END PGP SIGNATURE-----




More information about the bazaar mailing list