pack vs knit based push syd-london packs 1/4 the time of knits... and SFTP write latency glitch?

John Arbash Meinel john at arbash-meinel.com
Thu Aug 16 17:54:43 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Executive summary:
>    bzr pushing a new branch of bzr.dev revno 2000 to London from Sydney
> is 60 minutes with knits, 15 minutes with packs, and 5 minutes with
> rsync. 'Woot'.
> 
> But, something is suspect at the network layer: we can do the same push
> operation in 7 minutes within sydney, so something is causing 8 minutes
> of extra delay when the only change is the target machine,... and the
> latency is skyrocketing. As we're only creating 4 files during the
> operation we shouldn't be paying the create-stat-start-writing
> multiplier....

But couldn't you be bandwidth limited?

> 
> So Robey, I'm wondering if something is causing glitches with SFTP.

Well, the SFTP code does break up writes into async chunks. So it might
still be possible that we are running into latency here. I would expect
it not to be. But we do transmit 32kB chunks.

> 
> I was expecting pack creation, which does many small writes to an open
> file object, to perform reasonably as we have opened the SFTP file
> object its writing to with 'pipeline=True'. However when I test from
> Martin's place to mine, which are both in Sydney, performance was
> brutally slow.
> 
> I then added a buffer and wrote ~64K at a time to the SFTP file, and
> performance leapt upwards, and passed knits. This was sufficiently good
> that push is twice as fast as pushing knits within the Sydney area, and
> only 43 seconds (out of 22 minutes) slower than rsync at pulling back
> from my place. The test branch is created by 'bzr branch -r 2000 bzr.dev
> test-branch'.

Did you try changing the buffering size for the file object to something
greater than 8kB? (f = self._sftp.open(..., buffer=64*1024)?).
I'm curious if we could get paramiko to do all the buffering for us. Or
alternatively, set the buffer size to 0, and do all the buffering ourselves.

> 
> Encouraged by this result I tried pushing to London, where rsync got a
> 5minute result, and bzr 15 minutes. tcpdump showed what *looked* to be
> regular pauses in the upload, and I haven't had time to test with a
> larger buffer, but I'm wondering - is there some chance that the
> pipeline parameter is not doing what it should? Do you have any
> suggestions about how to tell whats happening within the paramiko
> core...
> 
> If you'd like to play with the pack repository, bzr pull
> http://people.ubuntu.com/~robertc/baz2.0/repository, and make a branch
> with bzr init --experimental, then pull any content you want to into
> that branch. After that push and pull with it will preserve the
> repository format (except when you push into/from a shared repository).
> 
> -Rob

Just to clarify, all these are the "fully optimized" case where we have
a single pack file, right? So it doesn't take into account a branch that
has existed for a while, had a few commits done, and maybe a pull/merge
or two, etc.
I know you added the "bzr pack" command, which may shove everything into
a single pack file (or maybe bzr pack --harder/--full?). And there were
other discussions about automatically doing a order-of-magnitude
packing. (Keep a pack with 1, 10, 100, 1000, etc entries.) Or doing it
binary, or whatever order you want.

> 
> Performance results:
> 
> Martins to my place:
> -------------------
> push knits sydney
> real    13m51.436s
> user    0m22.309s
> sys     0m1.640s
> 
> push packs sydney
> real    7m7.261s
> user    0m18.453s
> sys     0m0.896s
> 
> pull knits sydney
> real    27m17.761s
> user    0m28.586s
> sys     0m1.672s
> 
> pull packs sydney
> real    22m43.602s
> user    0m27.474s
> sys     0m1.240s
> 
> pull-rsync-packs sydney
> real    22m0.688s
> user    0m0.304s
> sys     0m0.216s

^- This one is very surprising. Pull using rsync is 22 minutes? If it is
a genuine comparison, it means that standard pull is completely
efficient (since you can manually pull them in 22m43s, and rsync in 22m).
However, I feel like something is incorrect, since you can push in 7m7s,
you should be able to rsync a lot faster than 22m.

> 
> Martins to London
> -----------------
> 
> push knits London
> real    60m52.241s
> user    0m20.925s
> sys     0m1.300s
> 
> push packs London
> real    15m37.995s
> user    0m18.449s
> sys     0m0.728s
> 
> rsync push packs London
> real    5m41.282s
> user    0m0.244s
> sys     0m0.120s
> 

^- It certainly is fun to see these results. It will be interesting once
Andrew's changes land to see how that impacts your packs.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxIFTJdeBCYSNAAMRAlAyAKDRbJwaOUwZK18paowQRadgSdpjHQCeIu5F
NqE5I/3wUpPJJlJmYyYfbGU=
=6Ugp
-----END PGP SIGNATURE-----



More information about the bazaar mailing list