pack vs knit based push syd-london packs 1/4 the time of knits... and SFTP write latency glitch?
Robert Collins
robertc at robertcollins.net
Thu Aug 16 23:13:50 BST 2007
On Thu, 2007-08-16 at 11:54 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > Executive summary:
> > bzr pushing a new branch of bzr.dev revno 2000 to London from Sydney
> > is 60 minutes with knits, 15 minutes with packs, and 5 minutes with
> > rsync. 'Woot'.
> >
> > But, something is suspect at the network layer: we can do the same push
> > operation in 7 minutes within sydney, so something is causing 8 minutes
> > of extra delay when the only change is the target machine,... and the
> > latency is skyrocketing. As we're only creating 4 files during the
> > operation we shouldn't be paying the create-stat-start-writing
> > multiplier....
>
> But couldn't you be bandwidth limited?
No, thats what the local-to-sydney push, and rsync push figures
establish. Pack push can do the push in 7 minutes with 40ms latency, but
takes 15 minutes with 310ms latency.
> >
> > So Robey, I'm wondering if something is causing glitches with SFTP.
>
> Well, the SFTP code does break up writes into async chunks. So it might
> still be possible that we are running into latency here. I would expect
> it not to be. But we do transmit 32kB chunks.
Thats not enough to counteract the bandwidth-latency multiplier, if we
are waiting for each chunk to be confirmed before sending the next.
> >
> > I was expecting pack creation, which does many small writes to an open
> > file object, to perform reasonably as we have opened the SFTP file
> > object its writing to with 'pipeline=True'. However when I test from
> > Martin's place to mine, which are both in Sydney, performance was
> > brutally slow.
> >
> > I then added a buffer and wrote ~64K at a time to the SFTP file, and
> > performance leapt upwards, and passed knits. This was sufficiently good
> > that push is twice as fast as pushing knits within the Sydney area, and
> > only 43 seconds (out of 22 minutes) slower than rsync at pulling back
> > from my place. The test branch is created by 'bzr branch -r 2000 bzr.dev
> > test-branch'.
>
> Did you try changing the buffering size for the file object to something
> greater than 8kB? (f = self._sftp.open(..., buffer=64*1024)?).
> I'm curious if we could get paramiko to do all the buffering for us. Or
> alternatively, set the buffer size to 0, and do all the buffering ourselves.
Ah, didn't know that knob existed; I'll try it shortly. I'm on a
different link, so the figures are not comparable directly, but I'll do
a before-and-after test.
> > If you'd like to play with the pack repository, bzr pull
> > http://people.ubuntu.com/~robertc/baz2.0/repository, and make a branch
> > with bzr init --experimental, then pull any content you want to into
> > that branch. After that push and pull with it will preserve the
> > repository format (except when you push into/from a shared repository).
> >
> > -Rob
>
> Just to clarify, all these are the "fully optimized" case where we have
> a single pack file, right? So it doesn't take into account a branch that
> has existed for a while, had a few commits done, and maybe a pull/merge
> or two, etc.
> I know you added the "bzr pack" command, which may shove everything into
> a single pack file (or maybe bzr pack --harder/--full?). And there were
> other discussions about automatically doing a order-of-magnitude
> packing. (Keep a pack with 1, 10, 100, 1000, etc entries.) Or doing it
> binary, or whatever order you want.
First push writes a single pack, always, regardless of the source's
structure. Single pull will do 4 readv's per source pack, plus 4 index
lookups per source pack (and with the non-paging index thats 4 simple
reads).
Autopacking is already implemented, with exponentially bigger packs to
give log10 number-of-packs growth. The actual algorithm is very simple.
We generate a plan for the number of packs of each revision count we
want, based on the total revisions in the repo. To do this we simply
multiply each digit in the base10 representation of the revision count
by 10^position. Eg. 234 revisions gives a plan of [2x100, 3x10 and 4x1]
packs. Then we rearrange the existing packs, starting with the largest,
to meet this plan - and we keep any packs with more revisions that we
are aiming for, to avoid splitting already packed packs.
The pack command generates a single mega-pack, and is where doing
complex stuff like regenerating diffs to be forward deltas will happen.
I should have the format capable of supporting that next week, allowing
anyone interested to add that in without having to refactor knits etc.
> > pull packs sydney
> > real 22m43.602s
> > user 0m27.474s
> > sys 0m1.240s
> >
> > pull-rsync-packs sydney
> > real 22m0.688s
> > user 0m0.304s
> > sys 0m0.216s
>
> ^- This one is very surprising. Pull using rsync is 22 minutes? If it is
> a genuine comparison, it means that standard pull is completely
> efficient (since you can manually pull them in 22m43s, and rsync in 22m).
> However, I feel like something is incorrect, since you can push in 7m7s,
> you should be able to rsync a lot faster than 22m.
Asymmetric links: Martin's upload rate is ~4 times mine. So pull is pull
up from my ADSL, and push is pushing up from Martin's ADSL. Hes on
ADSL2, I'm on regular ADSL.
> > push knits London
> > real 60m52.241s
> > push packs London
> > real 15m37.995s
> > rsync push packs London
> > real 5m41.282s
> ^- It certainly is fun to see these results. It will be interesting once
> Andrew's changes land to see how that impacts your packs.
I think they will make it possible for knit pulls to be competitive with
packs over sftp, and packs over bzr+ssh/bzr+http to not be insanely slow
(we don't do any async operations for the smart server at the moment -
open_write_stream is implemented as 'append').
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070817/8039013b/attachment.pgp
More information about the bazaar
mailing list