1.6 fetch regression

Wed Aug 27 23:17:15 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So I'm trying to put together some discussion about ways forward to improve
branch/pull/etc (fetching in general). I did some testing over the loopback
with 100ms delay (200ms ping).

$ time bzr1.5 branch
$ time bzr.dev branch

Looking at the log for bzr 1.5 it spends:
  3s  3s open an ssh connection and issue the "hello" request
  8s  5s opening all the format objects, etc. Now ready to "fetch"
 26s 18s 12 calls to 'get_parent_map()', presumably to get the whole ancestry
         graph
 37s 11s Repository.stream_revisions_chunked, this is the time the remote bzr
         spends thinking about what to transmit, before it actually returns
         'ok' and starts streaming data.
 62s 25s Streaming 24.3MB file texts (guess)
 92s 30s Streaming 38.8MB of inventory texts (one chunk)
103s 11s Streaming  3.5MB of signatures
109s  6s Streaming 10.4MB of revision texts
123s 14s Re-open the branch at least 2 times, I don't really know why.
	 one seems to be for 'branch.conf' and 'tags', the other doesn't read
         'branch.conf' but does read 'tags'

Obviously something weird is going on that we are opening the remote branch
and repository at least 3 times.

The key part is that it takes the remote bzr 11s to iter the inventories to
find what to fetch, and then for the most part can just stream the data.

For bzr.dev it is a lot harder to analyze, because things are spread out more.
  3s  3s open an ssh connection and issue the "hello" request
  8s  5s opening all the format objects, etc. Now ready to "fetch"
 26s 18s Make 70 requests for .rix files. Most of these are 'readv()' [1]
 39s 13s Make 13 .pack requests (I have 18 pack files here).
	 At least we don't seem to request anything 2x from the same file
 48s  9s Make 18 readv && 18 get requests for .iix [2]
 80s 32s Make 13 readv requests on .pack files for the inventories
 94s 14s 18 readv && 18 get requests for .tix [same as 2]
134s 40s 13 readv requests on .pack files for texts
142s  8s 18 readv && 18 get requests for .six files
149s  7s 12 readv requests on .pack files for signatures
161s 12s Open the branch a couple of times just to make sure that we really
         have the format we thought we had when we started 3 minutes ago

[1] One concern is that there are 32 requests for ...6aac8e.rix, with the
final one being a '.get()' request. Almost all of them return 64KB, (one
returns 128KB). The get returns 1.5MB.
In the readv() requests alone, we've read 2.0MB of data. Or enough to have
read the file 1.4 times.

[2] This seems to be a readv to read the header and a page of data. Once it
has that it realizes that it really wants the whole thing, and issues a "get"
request. For most of the get requests it *already has* the whole thing in memory.

So it seems to be primarily an issue of number of pack files, in that each
pack file adds another round trip.

Packing the repository completely I end up with bzr-1.5:
  3s  3s connected
  8s  5s opened the branch and repository
 23s 14s get_parent_map calls
 57s 34s stream file texts
 82s 25s stream inventories
 90s  8s stream signatures
 95s  5s stream revisions
110s 15s song-and-dance to finish

With bzr.dev:
  3s  3s connected
  8s  5s opened the branch and repository
 20s 12s 40 readv calls, finishing on a get call. Note 64K*40=2.6MB versus the
         1.8MB get request
 30s 10s 5s for a single pack request, 5s to process it
 34s  4s 1 readv, 1 get for .iix
 52s 18s 1 readv of .pack for inventory texts
 70s 18s process inventory
 74s  4s 1 readv, 1 get for .tix
 79s  5s process tix info
 93s 14s 1 readv of .pack for file texts
117s 24s process the file texts
120s  3s 1 readv, 1 get for .six
122s  2s 1 readv of .pack for signatures
126s  4s processing
137s 11s song-and-dance to finish

So a few interesting bits here. The CPU time for this is 58s versus 26s.
Almost double. That could certainly be just splitting the load between the
local and remote server. So even when we make round-trips fairly minimal, it
still amounts to extra overhead.

Also interesting is that "bzr branch" on the local filesystem is only 32s
total, (31.1s user time). So we seem to be spending 27s in processing the RPC
calls.

Now, for 'bzr branch' just on the local filesystem, the new bzr.dev is *much*
better. "time bzr branch" is 32s versus 1m15s (75s user).
Which is a bit of a mystery why it is so much slower over a remote connection.

Note also that:

$ time bzr.dev branch bzr+ssh:// # 0 latency
  53.89s user 28.91s system 96% cpu 1:25.97 total

 0.5s      to connect
 0.6s 0.1s ready to fetch
 1.7s 1.0s read all of .rix
 3.5s 1.2s read revision texts
 8.0s 4.5s processed revision texts
 8.1s 0.1s read all of .iix
 9.6s 1.5s processed .iix
 21s  11s  read inventory from .pack
 38s  17s  processed inventory texts
 39s   1s  read all of .tix
 44s   5s  processed .tix
 54s  10s  read texts from .pack
 76s  23s  processed texts
 77s   1s  read .six, .pack
 81s   4s  finished .pack, started reading branch/format
 83s   2s  creating working tree
 86s   3s  working tree built, opening bzr+ssh://...format, tags

So for whatever reason, adding 'bzr+ssh' into the stream adds 23 User seconds,
and 54 total seconds. Though if parsing the stream is 23s, it might be 23s to
*generate* the stream. And 23+23 is 46/53s (plus a few seconds for the ssh
handshake, etc.)

Over the local network I also see:
$ time bzr.dev branch http://
  35.41s user 3.40s system 69% cpu 55.926 total
$ time bzr1.5 branch http://
  38.61s user 3.30s system 69% cpu 1:00.54 total

(note that this repo is slightly different, but still a packed repo) I'm
actually quite surprised to see that bzr-1.5 branching over http:// is
*faster* than branching locally. (1m versus 1m15s).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFItdJrJdeBCYSNAAMRAoUCAKCClfBn0RXX8Ry583hLnW89+q41RQCfRRE8
TUSxj7d+3IeYb6+WXyb59w0=
=ccoL
-----END PGP SIGNATURE-----