usertest results for index2

Mon Jul 7 02:38:46 BST 2008

On Sun, 2008-07-06 at 12:22 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Ian Clatworthy wrote:
> | csv files attached comparing common operations across
> | 3 deep repos - python, mysql 5.1 and OOo:
> |
> | * bzr = rev 3524
> | * bzr-btree = rev 3524 + index2 plugin
> |
> | In the latter case, bzr init --btree-plain; bzr pull
> | was used instead of bzr branch. See usertest rev 118
> | for the benchmarking code.
> |
> | In summary, log one file is much quicker while most
> | other things are much the same, if slightly slower.
> | These are fully packed repos though so that's probably
> | making the current index code look better than it
> | really is. Even so, we might want to dig a little deeper
> | on some of these results before making a format with
> | btree indexes the default.
> |
> | Ian C.
> |
> 
> Thanks for running this. If you have a chance, I do have a few ideas...
> 
> 1) I have a feeling this is mostly because the btree code isn't caching
> as aggresively as the current graph index code.
> 
> 2) Run with 'bzr -Dmemory'. With the latest from bzr.dev this will dump
> the status information just before the process exits. So you can find
> the total memory consumption, etc.
> 
> 3) There are a couple interesting lines:
> 
> By default we don't cache any of the individual lines, but you can
> enable it in 'btree_index.py'
> 
> # Default max size is 100,000 leave values
> self._leaf_value_cache = None # lru_cache.LRUCache(100*1000)
> 
> You can set it to an LRU, or make it a plain python dict if you want to
> simulate what the current graph indexes are doing.
> 
> I would be curious to see the difference on both memory consumption, and
> performance timing.
> 
> 4) This format is really tuned for more "real world" cases where you
> don't have a single optimal pack, but you have several. For example, you
> could try doing "for i in `seq 8`; bzr commit --unchanged;" just to
> shove a few extra packs into the repository. I would like to *think*
> that btree will scale better than graph under those conditions. (It is
> also something I'm trying to look closely at with pybloom code.)
> 
> 
> I realize you are fairly out of commission to do this testing in the
> next few days, Robert is at Guadec, and I'm working on merge.... :(
> We'll get there, I guess.

If someone wanted to, 
To make a real-world-ish repository:
call Repository._pack_collection.pack_distribution(revision_count)

That will give you a frequency list for packs, in powers of 10.
e.g.
[2,3] means 2x10 and 3x1

Then you can get a reasonable approximation by doing successive pulls to
setup the repo:
bzr init --btree-plain
bzr pull -r 10
bzr pull -r 20
bzr pull -r 21
bzr pull -r 22
bzr pull -r 23

its not quite right because the former is talking revisions-in-ancestry
the latter is revisions-on-mainline

but you should end up with something not fully packed, and not have to
spend many many fetches making it.

-Rob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080707/915572e4/attachment.pgp