[RFC] explicit selection of edited files

John Arbash Meinel john at arbash-meinel.com
Wed Oct 31 16:10:58 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 10/30/07, Robert Collins <robertc at robertcollins.net> wrote:
>> I've been thinking about doing a plugin for explicitly marking files to
>> edit, for making things like converters faster, as well as in obscenely
>> large tree situations.
> 
> For converters, I think they should just pass the full list to commit,
> and we should make sure that if a list if given, commit does not read
> anything else.  This may need special attention when eg a new
> directory is added with only some files inside wanting to be
> committed, so ci --no-recurse.  We might also need --files-from when
> there is a long list.

Well, coming from cvsps-import's MinimalTree, I can say that we are actually
pretty good about not reading files that don't look like they have changed.
That implementation only stores the file texts (in memory) for things that are
considered "modified". Now, it doesn't use all of the commit.py code path, but
at least CommitBuilder is good about it.


> 
> But you could do this a bit more generally too, which would be a bit
> like the 'must add file versions explicitly' of git or bk.
> 
>> Seems to me the basics are:
>>  - we maintain a list of 'edited' files.
>>  - a command to add/remove to that list.
>>  - some commands like 'revert' will probably want to add to that list
>>  - all tree operations like commit/diff/st will want to use that list as
>> an automatic selected-paths criteria.
>>  - commit resets the list
>>  - revert to basis resets the list
> 
> Having an actual (rather than conceptual) list separate from the
> working inventory seems somewhat redundant.  (Though, of course, there
> is not quite enough space in the current working inventory to fit
> this.)
> 
> Saving the file versions to the repository at the time they're added
> may have better locality of reference.
> 
> revert to the basis revision would clear the list, and revert to some
> other revision would make the list point to the previous revisions.
> 
> I think 'the list is empty' is not the same as 'I'm not using this mode'.
> 

I think the mode could be interesting. Though we've shown that you can stat an
entire large tree in a reasonably fast time. I suppose having the edit list
would help for cold cache. And it would be a way to shave some time off.

Just for a cold/hot comparison, looking at a 107k entry tree, I get:

% time find . >/dev/null
real    0m20.059s
user    0m0.070s
sys     0m0.350s
% time find . >/dev/null
real    0m0.361s
user    0m0.120s
sys     0m0.240s

It is pretty obvious which is hot and which is cold. This is a rather fast
machine, though.

Anyway, just to say that with a hot cache, the best case savings is <500ms. 1/2
a second is very tangible. I would guess it isn't worth the overhead in the
general case, but for some users they might really like to use it.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHKKkSJdeBCYSNAAMRAkskAJwNrlrgqtqpBY3s7CXj5dp0G52ghQCfXlO6
xCHHP5AOM1xJI55bSuymBJQ=
=cwi+
-----END PGP SIGNATURE-----



More information about the bazaar mailing list