Slow inventory extraction from weavefile

Tue Oct 4 03:09:37 BST 2005

On 04/10/05, John A Meinel <john at arbash-meinel.com> wrote:
> I don't know what the target speed is, but with the latest bzr.newformat
> tree, on my old 450MHz celeron, it takes >1.5 seconds per inventory that
> I want to extract. (For 20 revision, it took 36 seconds, over the course
> of all revisions, it took between 1.8-2.0s per revision).

I don't have a specific target in mind, other than that it should be
fast enough for large projects on typical machines.   450Mhz is
perhaps a bit slow.

> Now, I'm not loading the weave directly, and then just asking it
> repeatedly for just a section, so there is some time which is spent
> loading from the disk, but after 20 requests, I'm pretty sure everything
> is cached by the OS.

There's also a cost in parsing the weave file repeatedly, which is
probably higher than the cost of extracting a version once we have the
weave object in memory.  We should try to keep it cached inside the
Branch object.  There is some stubbed-out code to do this.  The only
complication here is that we need to make sure to release it from
memory if the weave file has been written by someone else.  We can tie
this to read locks on the branch, but read locks are not satisfactory
in the long term because we can't do them over http.

> On my Mac 1.5GHz, it is 13s for 20 revisions, so closer to 0.6s per
> inventory (about 0.71-0.74s per rev for all 1728 revisions). Also on the
> Mac, it takes 3m5s to just do "weave.py stats .bzr/inventory.weave",
> which is 1782 versions, or 0.1s per weave (not bad, but 3m for stats
> seems a little long).

The best way to improve stats and check is to problem write the
parallel-extract mentioned in the docs; that is to say to return each
line once along with a list of which revisions include it.

--
Martin