[MERGE] Updated sftp_readv

John Arbash Meinel john at arbash-meinel.com
Tue Dec 18 15:04:27 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This changes the sftp readv code a little more. It reduces the amount of
buffering it needs to do. So it no longer needs the complete collapsed section
before it can start returning data.

I created a semi-trivial repository with semi-large files in it. It was about
63MB of .bzr/repository data.

This patch drops the peak memory consumption from:
VmPeak: 359288 kB
to
VmPeak: 49192 kB

I don't see nearly as much improvement for a bzr.dev tree, but it may be that
my repository is mixed enough that the readv stuff has to be cached anyway.
(Here's to getting 'bzr pack' to organize everything in proper topological
order soon.:)

I'm also surprised to see that huge of a difference. If I'm counting correctly,
it makes it about 5x the total repository size. And I certainly don't see 5
copies in the code.

2x seems unavoidable (all = ''.join(data) has to have 2 copies), and I suppose
if there were multiple coalesced ranges and 'all' was still lying around that
would be 3 copies. And then splitting that data into its ranges would be a 4th
copy. I still don't see a 5th copy. I wonder if that is in higher level code.

Anyway, 5x would certainly explain why a 512MB repository failed to copy (512*5
~ 2.6GB).
The largest single file in the repo is ~6MB, so we may still have a multiplier,
but it should be on smaller chunks. (So checking in your CD ISO or DVD will
still cause problems, but at least checking in 100 5MB files should work.)


Interestingly enough, there seems to be a small bug in our sftp handling. When
using "sftp://localhost" I occasionally get:
  File "/var/lib/python-support/python2.5/paramiko/sftp_file.py", line 131, in
_read_prefetch    self.sftp._read_response()  File
"/var/lib/python-support/python2.5/paramiko/sftp_client.py", line 604, in
_read_response    raise SSHException('Server connection dropped: %s' % (str(e),))

It is happening with both the new code and with bzr.dev. My best guess is just
that we have actually streamed all the data already, and the ssh subsystem has
quit on us because it isn't needed anymore, but I'm not really sure (this is
with paramiko 1.6.4).

Anyway, I would really like to get back to the 10 other patches I need to clean
up and get submitted. I didn't write any new tests, and the code could use a
bit more testing since it passed the test suite before my last round of
corrections. As it stands it is a lot better, but if someone wants to take over
and write some unit tests, I'm happy to assist.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFHZ+F7JdeBCYSNAAMRAqqbAJj5hN7xTUk6gzYebWkESJpshvGfAKCvpfSU
FRPI2yqsu2rcsFfWap+hCA==
=kRiG
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sftp_chunked.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20071218/ed774547/attachment-0001.diff 


More information about the bazaar mailing list