1.6 fetch regression

Thu Aug 28 21:58:00 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
...

> My best guess is that we are getting some really long buffered bytes, and the
> cost of reallocating that is costing us a lot.
> 
> I've been playing with refactoring it into a
> "self._in_buffer_list.append(bytes)" and then collapsing at need. But it turns
> out that
> 
> ''.join(self._in_buffer_list) then shows up as the hot spot.
> 
> Right now I'm trying to decide if this is just the way it has to be, or if the
> calling code is only consuming a portion of this buffer. Or if we are adding 2
> bytes to a 10MB buffer, etc.

So it seems that we are just a bit too good at collapsing the readv request.
Specifically I see results that look like:

extract:  1 65540 7624177
extract:  1 131076 7624177
extract:  1 196612 7624177

Which means that we have a readv() call that is expecting something like 7.6MB
back, but is getting it in 64kB chunks. And at each pass we add the new 64kB
into the existing buffer, and then try again, even though it isn't going to
fit *this* time either.

I'm going to try changing my buffering strategy slightly. But I also wonder if
it wouldn't be better to penalize the "coalesce" strategy a bit. That also
will allow us to yield bytes back to the caller as more bytes stream in.
Rather than waiting for the stream to finish, then processing them all in one go.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFItxFYJdeBCYSNAAMRAocoAJ49n7YQiUrTWJuRwJffbYTkR42+3ACdHpqq
OHW5qmGTPjJzPA11Hb4j2t4=
=i4mo
-----END PGP SIGNATURE-----