FreeBSD Ports statistics

Thu Aug 31 20:18:45 BST 2006

John Arbash Meinel wrote:
> I thought I would share a few of the FreeBSD ports statistics that I
> have been able to extract so far.
> 

Here are a few more that I have been able to glean.

1) Total size of 'cvs rlog' is about 260MB. This has
  grep -rnI "^revision [[:digit:]]" ../log.txt | wc -l
731,732 file changes

Tailor thinks this is ~160K changesets, or an average of 4.5 file
changes per revision. In 4hrs I converted 2K revisions, (8 per min) so
that puts the expected ETA at 320hrs or 13 days.

Unfortunately, I would expect it to slow down as the tree gets bigger
and bigger. So to convert the entire repository using Tailor will
probably take close to a month.

2) Tailor has a memory consumption bug. When I first run tailor, it does
a full 'cvs rlog' dump, and I assume parses that into changesets. It
then starts operating, and consumed 1GB of RAM.
After stopping and restarting it, it is now only consuming 120MB of RAM.
My best guess is that it parses the rlog contents, and then leaves the
Unicode string in RAM. But doesn't write that to the state file, because
it doesn't need it anymore.

3) Tailor is fairly slow, but not terribly so. It has taken about 4hrs
to create the first 2K revisions. (With approx 6000 file-level changes).

At this point, I'm pretty sure that all of this is 'cvs' time. Tailor
spawns 'cvs update -d -r 1.2 filename' for *every* file content change.
It isn't able to combine multiple updates. And since RCS uses back
patches, to checkout an early version requires applying all backwards
patches (Thus is O(n revisions)).

Maybe once we get tens of thousands of revisions bzr will start becoming
more of a bottleneck. cvs will get faster because of fewer back patches,
and I especially expect writing to the 'inventory.knit' to become an
issue. A full text inventory grows to around 30MB in size, so every 50
revisions we are going to get another full 30MB full text, and with 160K
revisions we have a lot of full texts.

I realize not all 160K revisions will have that large of an inventory,
but if you figure 160K/50 * 15MB = 48GB, *just* for the inventory.knit.

This tells me that we would probably be better off with a more relaxed
number of full-texts. I know Mercurial does it based on size (when size
of deltas == size of fulltext, create a new fulltext).

I think we could be a little bit more flexible and do something like the
above, with some sort of upper limit (100, 1000?, something reasonable)

4) It also brings up the earlier concept of breaking things up into a
per-directory inventory. (And alternatively breaking things up into a
per-directory branch nested tree setup.)

Just looking at the changes as they come in, 90+% of them only modify
one package. Though I have seen some that affect a few Makefiles across
the packages. (such as a dependency change like "Change jpeg library
major to 6")

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060831/d8565512/attachment.pgp