prefetch still broken with readv and paramiko 1.6.1

John Arbash Meinel john at arbash-meinel.com
Mon Jul 31 19:57:06 BST 2006


Robey Pointer wrote:

...

> Yeah, I agree, handling shorter responses would be "right", but add
> enough latency to ruin the prefetching.  Thanks for going to the effort
> of trying it out, though.
> 

Yeah. It would mean a little performance increase for me, since my
implementations use 64K rather than 32K. But probably there isn't a huge
performance benefit.

> 
>> I think the reason that paramiko-trunk is so slow is because we have a
>> fairly large queue at that point, and iterating through the queue every
>> time is slow. At least that is my guess. Something weird is going on,
>> considering it effects seek+read performance without prefetch or readv
>> enabled.
>>
>> I'm not positive, though.
>>
>> This are my results:
>>
>> paramiko-trunk readv    98.89
>> paramiko-trunk seek+read    96.81
>> request 32k readv    79.75
>> request 32k seek+read    87.64
>>
>> Anyway, it might be possible to rewrite your trunk such that it is still
>> fast, and allows readv and prefetch to operate simultaneously. I'm not
>> 100% positive as to how to do that, though.
> 
> This is bad.  Can you point me to the code you used to get those stats? 
> I'd like to dig in further and see if I can figure out what's going on
> in paramiko-land.  The queue management when using readv without
> prefetch ought to be basically the same as before...
> 
> robey

http://bzr.arbash-meinel.com/branches/real-world-benchmarks/jameinel/

Is the branch where I was doing all of the testing. It really isn't a
general purpose branch. It is just a bunch of scripts that I use to run
'bzr something' multiple times against different branches, and collect
the timing results.

If you look at that project, you can find the specific results in:
sftp-pull/benchmark-48ms-remote-pull-1800-paramiko-trunk.csv

And compare that to:
sftp-pull/benchmark-48ms-remote-pull-1800-read-32k.csv

My 'readv()' work has made it into mainline, so seek+read and readv are
both present. The code path is triggered based on whether the returned
file has a 'readv()' member. But you can do this too:

=== modified file 'bzrlib/transport/sftp.py'
--- bzrlib/transport/sftp.py    2006-07-28 16:52:19 +0000
+++ bzrlib/transport/sftp.py    2006-07-31 18:49:18 +0000
@@ -454,7 +454,7 @@
             path = self._remote_path(relpath)
             fp = self._sftp.file(path, mode='rb')
             readv = getattr(fp, 'readv', None)
-            if readv:
+            if readv and int(os.environ.get('sftp_readv', '1')):
                 return self._sftp_readv(fp, offsets)
             mutter('seek and read %s offsets', len(offsets))
             return self._seek_and_read(fp, offsets)


Then you can run:
sftp_readv=1 bzr branch sftp://localhost/~/foobar

versus
sftp_readv=0 bzr branch sftp://localhost/~/foobar

And run it with different versions of paramiko.
Also, the script 'sftp-pull/test-delay-pull.py' can be used to setup
some automated testing.

You can also contact me directly, or in IRC, and I can work through some
tests with you.

This was all tested with paramiko 373. I haven't tried with >= 374, but
I believe there is a general performance degradation between 371 and
373. (As I showed by sending you a patch against 371 that preserves
performance and still restricts to 32K reads).

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060731/e6ff98be/attachment.pgp 


More information about the bazaar mailing list