Problems with new streaming API

Wed Jan 16 22:57:52 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> I just got a new laptop, so I'm setting it up with all new bzr
> repositories, etc. I did a "bzr checkout bzr+ssh:///" over the local
> network to get the latest bzr.dev snapshot.
> 
> I thought it would be fast, but I found out a few things.
> 
> 1) With a client of 1.0.0, it uses the old stream api, which causes the
> server to spend a lot of time thinking, and buffering up 240MB in RAM. I
> don't really understand why it is so much, considering my repository is
> only 85 MB total.
> 
> But anyway, because of that buffering time, the network is empty, and at
> the end of it shoves a bunch of data and finishes
>   bzr checkout   21.43s user 1.23s system 11% cpu 3:23.30 total
> 
> (notice it takes 3min to get the data, but we only spend 21s processing
> it on the local end)
> 
> 2) Since I saw Andrews patches had finally landed, I thought it would be
> good to test them out. So far, memory consumption has stayed low (~30MB)
> which is good. However, I have almost 0 network traffic, and it has been
>  15 minutes.
> 
> I did an strace on the server side and I basically see it doing this:
> mmap2(NULL, 737280, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0xb7601000
> munmap(0xb76b5000, 737280)              = 0
> futex(0x81f0890, FUTEX_WAKE, 1)         = 0
> 
> It is basically spinning very fast doing that over and over and over
> again. (The only thing that seems to change is the address, the "737280"
> stays fixed.)
> 

So it did change a bit to points where it would just be futex() calls
for a long time.

It did end up completing, but only after about 25 minutes. I'm wondering
if all of that time was spent *before* it started actually streaming out
any data.

A couple of other points:

1) It "chunks" up by the requested revisions per "knit". Which is
certainly sub-optimal for packs.

2) It also means that we return all of the inventory texts in a single
chunk, and all of the revision texts and signatures in a single chunk.
This is what I saw:

...
              2213 byte chunk read
              4045 byte chunk read
              14589 byte chunk read
              5056 byte chunk read
              31416322 byte chunk read
              2832058 byte chunk read
              8596112 byte chunk read

So a lot of chunks < 10K, and then a 30MB chunk (I assume inventory), a
2.8M (signatures), and a 8.6M (revisions).

I think there is a bug open about bloating memory, and the inventory
portion could certainly be where it is happening.

Probably some of this is just changing it to optimize for packs
differently than for knits. I'm also wondering if we shouldn't do
something instead of:
            yield name, _get_stream_as_bytes(knit, versions)
do

  for pos in xrange(0, len(versions), 1000):
    yield name, _get_stream_as_bytes(knit, versions[pos:pos+1000])

Obviously it would be best to have it split up by size and not by
"number of revisions", but num revisions is at least an approximation to
size. (Something on the order of 100-1000 seems reasonable here).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHjovwJdeBCYSNAAMRAi+lAJ9XE4jLcXZGzE2Jg/E46Kr+RsOtJQCfZI8z
B1w7z4g0DLlSkJj7YISW76s=
=KMJt
-----END PGP SIGNATURE-----