Releasing a non-dev version of btree repo

John Arbash Meinel john at arbash-meinel.com
Sat Oct 4 00:14:35 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...

> 
> I've been playing around with changing the logic to "expand" requests
> (similar to how we do for GraphIndex calls.) So far, at least for 'bzr
> log --short -r X..Y', it is actually better to *not* expand.

I've worked on the logic a bit more. There are a few issues:

1) 'bzr log --short' certainly shouldn't be the only command tuned this
way. For bzr's mainline, all of the keys are clustered around 'pqm@'.
This is probably reasonable for 'bzr log --short', but it is probably
not *typical* of all access patterns.

2) For this workflow, reading the root key *by itself* is best. With
this repository, none of the packs have more than 2 layers (the root
btree, and then one layer of leaf nodes.)

Because of this, reading 64kB at the beginning is wasted. It would read
16 pages, but that would just be the first 16 leaves. 'pqm@' actually is
occurring around page 70 for both of the big packs, which makes reading
the extra pages a pure waste.

3) My ping to people.ubuntu.com is 100-200ms, with 160kB/s bandwidth. So
my 'sweet spot' is 160*.1 = 16kB to 32kB.

4) I am using an algorithm that does:

  a) Read the root page in a single request
  b) When reading other pages, expand forward and backward until you hit
     the cached values. This has an interesting property. Namely for
     'bzr log' we generally read page 70, then 69, then 68. With this
     algorithm it expands 70 => 68-72, but then expands 67 => 63-67.
     Which actually fits bzr log quite well.


5) With a 16kB page, I *do* see round-trip savings.

For 'bzr log --short -r -1000..-1'.

Without expansion, I see 23 pages read in 23 round trips. (--short
doesn't expand so we are always searching 1 revision at a time.)

With expansion, we get 13 round trips reading 26 pages. So 3 pages are
wasted, but we save 10 round trips.

10 round trips == ~1-2s savings, 3 pages cost 0.075s extra download.


6) With less-packed (default) indexes, I see 32 pages read with 13 round
trips (versus the minimum of 24 pages / 24 round trips.)

So packing harder does save some extra pages, but no extra round trips.
(bandwidth, but not latency.)


7) Doing this over the internet is a bit messy, as congestion, etc, play
a noticable role in minute-to-minute variation.

Anyway, there is a lot of tweaking which could be done, and this seems
like a case where we want to set up a bunch of different scenarios, and
tune them all at the same time. Which will help us avoid tuning "X" but
cost us Y.


8) For 'bzr log --long -r -100..-1' things get more interesting, but I
also seem to be triggering a buggy proxy:

bzr: ERROR: Invalid http response for
http://people.ubuntu.com/%7Ejameinel/test_repos/bzr_dev_pbtree
/.bzr/repository/indices/bf8c313b551d33e58f99ae1d130c3aa0.rix: Expected
a boundary (4586176e374927db1) line, got '--4586176e374927db1--'

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjmp1sACgkQJdeBCYSNAAMHtgCeNWQJ8XHYUf1h8vyI0i9Z4wdm
IpkAn3F9SwkfE1sLK064XfmQAdkwstSu
=wmwq
-----END PGP SIGNATURE-----



More information about the bazaar mailing list