Performance analysis

Mon May 7 09:27:11 BST 2007

How to tackle overall performance analysis, and where to start?
---------------------------------------------------------------

Even without being fully exploited yet dirstate, which was designed from
the ground up to meet our common use cases within tree level operations
has given us some massive wins (as well a bunch of bugs :(). I think we
need to perform a similar analysis for all our common case operations,
and ensure that we have a low friction underlying design which we can
work towards in each incremental release.

When we add features, their design changes the requirements for the
parts of the system they alter, so we need to take both current
features, and new features that are 'coming soon' into consideration as
we do this streamlining, to avoid significant rework when such features
are completed.

We have at least one major feature missing which is likely to impact
such an analysis - file copies or file aliases - however they are
implemented.

So the first thing we should do is select which features from our future
plans we want in the next round of 'make things fast'. I propose that we
go with our current feature set alone; new features seem to take 2-3
minor releases to be completely delivered, and delaying performance
across much of our stack when we can improve the current situation
significantly doesn't make sense to me even though it may mean rework in
6-12 months.

What does the analysis of a use case need to take into consideration?
---------------------------------------------------------------------

The analysis needs to be able to define the characteristics of the
involved disk storage and APIs. That means we need to examine the data
required for the operation, in what order it is required, on both the
read and write sides, and how that needs to be presented to be
consistent with our layering.

As a quick example: 'annotation of a file requires the file id looked up
from the tree, the basis revision id from the tree, and then the text of
that fileid-revisionid pair along with the creating revision id
allocated to each line, and the dotted revision number of each of those
revision ids.' All three of our key domain objects are involved here,
but we haven't defined any characteristics of the api or disk facilities
yet. We could then do that by saying something like 'the file-id lookup
should degrade gracefully as trees become huge. The tree basis id should
be constant time. Retrieval of the annotated text should be roughly
constant for any text of the same size regardless of the number of
revisions contributing to its content. Mapping of the revision ids to
dotted revnos could be done as the text is retrieved, but its completely
fine to post-process the annotated text to obtain dotted-revnos.'

What factors should we consider in coming up with our desired
-------------------------------------------------------------
characteristics, and what tradeoffs are acceptable?
---------------------------------------------------

Obviously, those that will make for an extremely fast system :). There
are many possible factors, but the ones I think are most interesting to
design with are:
- baseline overhead:
   - The time to get bzr ready to begin the use case.
- scaling: how does performance change when any of the follow aspects
   of the system are ratched massively up or down:
   - number of files/dirs/symlinks/subtrees in a tree (both working and 
     revision trees)
   - size of any particular file
   - number of elements within a single directory
   - length of symlinks
   - number of changes to any file over time
     (subordinately also the number of merges of the file)
   - number of commits in the ancestry of a branch
     (subordinately also the number of merges)
   - number of revisions in a repository
   - number of fileids in a repository
   - number of ghosts in a given graph (revision or per-file)
   - number of branches in a repository
   - number of concurrent readers for a tree/branch/repository
   - number of concurrent writers for objects that support that.
   - latency to perform file operations (e.g. slow disks, network file
systems, our VFS layer and FTP/SFTP/etc)
   - bandwidth to the disk storage
   - latency to perform semantic operations (hpss specific)
   - bandwidth when performing semantic operations.
- locality of reference: If an operation requires data that is located
   within a small region at any point, we often get better performance 
   than with an implementation of the same operation that requires the
   same amount of data but with a lower locality of reference. Its 
   fairly tricky to add locality of reference after the fact, so I think
   its worth considering up front.

Using these factors, to the annotate example I would add that its
reasonable to do two 'semantic' round trips to the local tree, one to
the branch object, and two to the repository. In file-operation level
measurements, in an ideal world there would be no more than one round
trip for each semantic operation. What there must not be is one round
trip per revision involved in the revisionid->dotted number mapping, nor
per each revision id attributed to a line in the text. 

Not all the items I mention above are created equal. For instance, we
have a smart server now; we could decide to stop trying to optimise the
performance over dumb protocols and rather consider local disk bandwidth
to be expected case for branches and repositories: focus on the smart
server for addressing performance over the network. Personally I would
be sad to see performance over the network degrade for non-smart server
situations, but I also want to see performance with the smart server be
rockingly fast. We can usefully define the commonly expected parameters
for each item: scenarios that stay within them can be our common case.

That said, most of our problems are with 'large' trees: Generally
speaking to improve performance on small trees with little history we
need to improve our baseline overhead and thats not all that easy. I've
started a 'bzr-piss' benchmark suite to get concrete answers on the
impact of our code structure on our baseline overhead. Note that
importing on demand actually increases overhead for commands that need
the lazy-imported code : it may be that we can do better than lazy
loading and thats one of the things I plan to answer there. That said, I
think the baseline overhead we have today is mostly ok: As long as we
dont regress, it should be fine.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070507/1b53d813/attachment.pgp