More freebsd ports statistics
John Arbash Meinel
john at arbash-meinel.com
Tue Sep 5 20:42:16 BST 2006
Well, I've left Tailor converting from CVS to bzr for a few days now,
and I'm up to almost 25K revisions. And this is a few things that I've
found so far.
1) As expected inventory.knit is quickly becoming a bottleneck. We have
30K files right now, making 'basis-inventory' 7MB. Because of our knit
caching algorithm, that means that every 50 revisions, we get another
7MB hunk in our inventory.knit file.
Right now, the total size of .bzr/ is 821MB. But inventory.knit is 540MB
of that.
2) 'bzr commit' time seems to be scaling based on the size of the
inventory, but not terribly so with the length of history. Which is why
I posted my other comments about different ways to make the inventory
handling better. (Now, not extracting 5 inventories when we only need 2
would be a huge boon, but it wouldn't change how we *scale*, just the
consant factor)
3) I keep copying the entire tree to improve performance. I don't know
who is being bad here (though I know for sure CVS does some bad things,
bzr might do bad things too).
For example, in the current tree, it takes 'find . > /dev/null' 22s, but
in a fresh copy it only takes 10s. (oddly enough, the first find after a
copy is the fastest at 4s. I wonder if it has to do with dirty pages
that haven't been flushed to disk yet. But I have more than enough ram
to cache this stuff).
So updating an existing tree with cvs does mess up the disk-layout,
versus copying everything.
'bzr branch' seems to create a nicely formatted tree. as the 'find .'
time is actually around 4s. Though we don't have any CVS directories, so
we have a good speed boost there. But I also believe this means we are
creating things in a reasonable order.
4) The time to branch the repository is pretty slow.
time cp -ar takes around 4minutes.
time bzr branch ports ports-test takes 29 minutes, and a peak memory
usage of >726MB. I believe this is because of the
'find_file_ids_affected_by', which has to troll through all of the
inventory records. At least it doesn't have to do a full extraction and
a new diff for each one.
And now we cache all of the inventory records in memory when doing a
join(), because when doing it remotely, we don't want to download it
twice. This would actually be a use case for having the cache overflow
to disk if remote, or just be ignored if local. But because of our
earlier fixes, we at least don't have the giant memory jump when joining
the knits.
5) I did some testing of *extraction* time for the inventories. I found
a few things here.
a) I was wrong, we don't cache every 50 revisions, we cache every 26th
revision. (the code says while count < 25 ==> create a new delta) Which
is partly why the inventory.knit file is growing so huge.
b) The time to extract a single delta is 1-2ms. If we have around 1100
files in the inventory, it takes 15ms to extract a full text, and 22ms
to create a full text from a full text + 26 deltas. (So applying 26
small deltas only takes around 4ms).
With more 4800 files, a full text takes 64ms, and to extract and apply
25 deltas takes 77ms (about 9ms difference)
With 12000 files, it takes approx 165ms to extract a full text, and
200-250 ms to extract and apply. The numbers seem to be getting a little
bit weird now, because I would expect the time to be monotonically
increasing, but it is varying quite a bit.
What I also don't understand exactly is that the 'get_delta()' time is
also increasing. When getting the first few deltas, it only takes 1-2ms,
but when we are in the 17000's, it takes 4-5ms for each one. And if I
reverse the extraction, I get approximately the same time for extracting
each delta).
To extract every delta and every full text for 25000 revisions takes 2
hours.
The last few entries have almost 30,000 inventory entries, at 7MB total
data size, and take approx 0.55s to extract (550ms).
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060905/1c5ee015/attachment.pgp
More information about the bazaar
mailing list