prefetch still broken with readv and paramiko 1.6.1

Wed Jul 26 13:14:24 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robey Pointer wrote:
> 
> On 25 Jul 2006, at 9:28, John Arbash Meinel wrote:
> 
>> Well, I've found a few more bugs in the readv/prefetching logic.
> 
> Would you try the current bzr trunk?  Located here:
> 
>     http://www.lag.net/paramiko/bzr/paramiko
> 
> I took a 90 degree turn, decided to stop explaining how readv is meant
> to be used, and  tried to adapt it to how you're using it.  So I'd
> appreciate feedback on whether that makes it work right for you.
> 

I do believe I understand how readv is supposed to be used, and am using
it correctly.

> 
>> Further, I found a bug in 'readv()' if you are requesting large ranges.
>> It seems that 'sftp.read()' is only able to return a maximum range of
>> 64KiB (65536 bytes)
> 
> On some servers it's as low as 32K.  That's a bug that currently
> paramiko doesn't chop up large readv requests into the correct chunk
> sizes.  I think it should be fixed in the trunk.
> 

I'll look into it.

> I still think that in general you won't want to use prefetch when you
> know you're just going to readv() a few sections of the file.  But that
> should at least work now and not be as huge a penalty as it was before.
> 

I'm not using readv() with prefetch(). I do understand they overlap.
What I was *testing* is that if you are downloading *most* of the file
using seek+read, it is actually faster to do a prefetch(). But if you
aren't downloading most of the file, it is much better to just read the
sections you want.

And the best is to use readv() which does an async request for just the
sections you want.

> [ What I mean by the above: Prefetch basically pretends like you did a
> read() of the entire file, and downloads the whole thing.  If you know
> you're only going to want to read *part* of the file, some of that
> download is wasted.  You'd be better off skipping the prefetch and just
> doing a readv().  I know I haven't been explaining this well. ]
> 
> robey
> 
> 

You've been doing fine for me. Maybe I haven't been explaining what I'm
doing well. The round trip overhead of doing lots of 'seek + read' calls
is sufficient that if you are getting some large fraction (say > 90%) of
a file, it is faster to just request the whole file, and break it up on
the local side.

Now, my test setup has biased this as well, since I was using a loopback
delay (using 'tc' and 'netem'). Which means that though a single ping
takes 50ms, data can stream at a very high rate. Which is why I'm
re-doing the test against a real server.

I'll look into you changes in the trunk of paramiko. The only thing I
want you to change about paramiko's readv() is for it to handle the
32K/64K bug. Because a single request can go above that.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEx1yfJdeBCYSNAAMRAg3RAKDJywudIke46aeo2r/6uOtVFOzjaACeIDsi
jTu8aDLznIUy/+TZkTwfmJs=
=XSIq
-----END PGP SIGNATURE-----