FreeBSD Ports statistics

John Arbash Meinel john at arbash-meinel.com
Sun Sep 3 23:00:16 BST 2006


Lele Gaifax wrote:

...

>> Yep, it worked, and I can see that it does update multiple files at once
>> now. So now 'cvs update' isn't as big of a deal, just 'bzr commit' time,
>> and some other time that I don't know what Tailor is doing. Right after
>> it prints the log message, it spends a couple seconds just sitting there
>> before it starts spawning 'cvs update'.
> 
> Uhm, I assume you don't have a "delay_before_apply" option... so this is
> strange. Thanks for reporting, I'll try to explain that too.
> 
> ciao, lele.

I'm pretty sure I know why:
If you look at repository/cvsps.py at def _applyChangeset:

the first thing it does is create a new CvsEntries object.

And looking at cvs.py I can see that in the constructor of a CvsEntries
object, it reads the Entries file, and then recurses through all the
directories, stat'ing everything.

Which means 2 things:

1) It goes in depth first order, rather than grabbing all the entries of
each directory, and then going to each child. In our testing, it is
quite a bit more efficient to get a directory listing, stat all the
files/dirs, then recurse into the next directory.

2) It has to do a complete search through the entire filesystem for
every changeset. Rather than saving this information and updating it as
the changeset is applied.

So really the delay is just the 'find .' time, which I've computed to
take something like 5-16s.

Also, as I discovered earlier, something about what CVS does messes up
the 'find .' time. So that if I stop tailor, copy the whole directory
out of the way, and then copy it all back, it is faster. (In the
ultimate case of a fresh cvs checkout of all 100K files, the difference
was 90s versus 4s.)

I'm not sure what kind of changes it would take for Tailor to not
require knowing the complete state of the tree.
From what I can tell, it doesn't really use much of that information.

I can see that it grabs 'entries.getFileInfo(e.name)'. But that really
doesn't need to know about all the other files.

I would recommend making CvsEntries a lazy loader. So that it knows if
it has read the CVS/Entries file (and/or listdir() of the directory).
So that until you actually make a request for a file, it doesn't need to
read all of those things. But once it has read it, it doesn't need to
read them over and over again.

I think this patch would do the trick. In my installation, I added some
debugging timings (would you be interested in the patches?)

Anyway, this change drops the time for the  entries = CvsEntries() time
down from 6-10s, to 0.00s (though naturally that is because it does it
later). But in my case where each commit only modifies about 4 files out
of 20,000, it is much faster.

I don't know that all of my changes are necessary (like the call to
workingtree.lock_write() should be redundant, since commit() should
auto-lock the tree).

But I found some of the changes helpful.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cvs_timing.diff
Type: text/x-patch
Size: 9214 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060903/5331ffb4/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060903/5331ffb4/attachment.pgp 


More information about the bazaar mailing list