Performance requirements for bzr checkout --lightweight

John Arbash Meinel john at arbash-meinel.com
Mon Sep 1 19:58:29 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
...

...

>> I realise that emphasising one operation doesn't necessarily
>> help with performance optimisation, but I would like to
>> ask for some attention to be paid to this operation. Ideally
>> a lightweight checkout of a packaging branch would take less
>> than 150% or 200% of the time for an "apt-get source" of the
>> same package.
> 
> So does this mean we are at 100x, 10x, 3x? There is a lot of variance here. If
> we were at 210%, *I* probably wouldn't focus on it (whether other people feel
> it is critical). If we are at 100x then something needs to be done.
> 
> Also, is this with bzr-1.5 or 1.6? I know 1.6(.1) got a lot better at fetching
> large (lots of files, not necessary lots of history) repositories.

So here is my numbers for doing "bzr co --lightweight lp:bzr", versus "apt-get
source bzr" (which gives bzr 1.3.1).
I'm also a bit surprised that "apt-get source bzr" tells you to go use the
debian packaging directory rather than either the bzr trunk itself, or some
ubuntu packages.

Anyway, this is with bzr.dev (which has the 1.6.1 fixes).

$ time bzr co lp:bzr
2m04s
$ time apt-get source bzr
24s

Or about 5x slower.

$ time bzr co lp:bzr
~11m

So 'bzr co --lightweight' is about 5x slower than "apt-get source" but 5x
faster than downloading the whole ancestry.

Trying to get down further into the details using kcachegrind...
90% of the "co --lightweight" time is spent in "get_record_stream()".

11% is getting the inventory, and 79% is getting the file content.

Of the time in get_record_stream it seems that:

57% of total time is in _get_components_positions and
33% of total time is in _get_content_maps

That would hint that we are spending 57% of the time dealing with reading the
index, and only 33% of the time actually downloading content.

While we aren't strictly bandwidth limited during all of this, we *are* pretty
close. (See attached screenshot, my download bandwidth is about 160kB/s, and
we hit that most of the time.) Note that there is some overlap between the
various segments, and the scale changed in the middle.

So part of the issue is that we have to download the history for files, not
just the tip value. We have to download at least back to a fulltext snapshot
(which may be up to 200 deltas away, and may consume 2x the compressed size.)
So we *should* be capped at downloading <2x the compressed size of all file
texts, and then whatever index overhead.

btree indexes should help with the index download overhead, though in their
current form, they might add a lot to latency. (I don't think we respect the
transport read-page-size hint, so we always download as 4k blocks, rather than
the suggested 64k blocks.)

I don't have specific numbers handy, one thing we *could* do, if we just
wanted to trade bandwidth for CPU time, is to use something like
GroupCompress, and implement an RPC that says "just give me all these texts".
Then on the server side, it could extract all the fulltexts, put them into a
custom GC stream, and send them across. (This is 90% the same as building a
tar.bz2 on the fly.)

I'm not very sure that the bandwidth to CPU tradeoff is worth it for
Launchpad. I don't know what kind of hardware currently (or in the future)
will be allocated to serving bzr streams. But generating tarballs on the fly
is not known as a cost-effective measure.

I suppose in theory you could cache the last one based on revision_id, and as
long as people are requesting the same, you just serve it out.

I will also comment that when we do finish polishing GroupCompress, it will
probably make getting a full checkout of a bzr branch much cheaper. It dropped
repo size from about 100MB => maybe 20 MB for my bzr repo (this is being a
little bit generous.) So if you had 5x faster, you could get the whole bzr
history in 2min. Which might trade off the fact that it is longer than the 30s
to grab just a tarball of the tip.

Anyway, I don't have great answers, because we really aren't built as a "give
me just the tip" replacement.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvDtVJdeBCYSNAAMRAovyAJ99U9Hqd+vtHHsbQR3tC+XrJfEe6QCgv0IQ
dQa+qFRCRaNiuqIZQDkzNiY=
=koS7
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bzr-co-lightweight.png
Type: image/png
Size: 48422 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080901/f7501628/attachment-0001.png 


More information about the bazaar mailing list