RFC: tortoise strategy document, part 2

Mark Hammond mhammond at skippinet.com.au
Thu Apr 3 09:45:34 BST 2008

Thanks to everyone who offered feedback on my TortoiseBZR document.  The new
version of the document follows up from the previous version - I haven't
changed anything in the text I previously sent - I believe all unresolved
issues from that feedback are covered below - please let me know if I've
missed anything.

I intended including the complete new version of the document as a merge
request - which is not to suggest I consider it final, but changes are
probably best tracked formally.  For your convenience, I've included a copy
of the new text below - this follows on directly from the text I previously
sent.  I'll send the merge request for the complete document once someone
hits me with the clue-stick :)

As always, feedback is welcome - particularly if you previously raised a
point that is not adequately covered in this version.



-- start of new text --

Performance considerations:

The following discussions assume the "Hybrid Python and C++ implementation"
strategy, discussed above, is chosen for implementation.  To recap, TSVN
becomes a C++ implemented DLL using RPC to talk to a Python implemented
"server" which does all BZR work.

As discussed above, the model used by Tortoise is that most "interesting"
things are done by external applications.  TSVN does show read-only columns
in the "detail" view, and shows a few read only properties in the
"Properties" dialog - but most of these properties are "state" related (eg,
revision number), or editing of others is done by launching an external
application.  This means that the shell extension itself really has 2 basic
requirements WRT RPC: 1) get the local state of a file and 2) get some named
state-related "properties" for a file.  Everything else can be built on

There are 2 aspects of the shell integration which are performance critical
- the "icon overlays" and "column providers"

The short-story with Icon Overlays is that we need to register 12 global
"overlay providers" - one for each state we show.  Each provider is called
for every icon ever shown in Windows explorer or in any application's
FileOpen dialog.  While most versions of Windows update icons in the
background, we still need to perform well.  On the positive side, this just
needs the simple "local state" of a file - information that can probably be
carried in a single byte.  On the negative side, it is the shell which makes
a synchronous call to us with a single filename as an arg, which makes it
difficult to "batch" multiple status requests into a single RPC call.

The story with columns is messier - these have changed significantly for
Vista and the new system may not work with the VCS model (see below).
However, if we implement this, it will be fairly critical to have
high-performance name/value pairs implemented, as described above.

Note that the nature of the shell implementation means we will have a large
number of "unrelated" handlers, each called somewhat independently by the
shell, often for information about the same file (eg, imagine each of our
overlay providers all called in turn with the same filename, followed by our
column providers called in turn with the same filename.  However, that isn't
exactly what happens!).  This means we will need a kind of cache, geared
towards reducing the number of status or property requests we make to the
RPC server.

We will also allow all of the above to be disabled via user preferences.
Thus, Icon Overlays could be disabled if it did cause a problem for some
people, for example.
RPC options

Due to the high number of calls for icon overlays, the RPC overhead must be
kept as low as possible.  Due to the client side being implemented in C++,
reducing complexity is also a goal.  Our requirements are quite simple and
no existing RPC options exist we can leverage.  It does not seen prudent to
build an XMLRPC solution for tbzr - which is not to preclude the use of such
a server in the future, but tbzr need not become the "pilot" project for an
XMLRPC server given these constraints.

I propose that a custom RPC mechanism, built initially using
windows-specific named-pipes, be used.  A binary format, designed with an
eye towards implementation speed and C++ simplicity, will be used.  If we
succeed here, we can build on that infrastructure, and even replace it
should other more general frameworks materialize.

FWIW, with a Python process at each end, my P4 2.4G machine can achieve
around 25000 "calls" per-second across an open named pipe.  C++ at one end
should increase this a little, but obviously any real work done by the
Python side of the process will be the bottle-neck.  However, this
throughput would appear sufficient to implement a prototype.

Vista versus XP

Let's try and avoid an OS advocacy debate :)  But it is probably true that
TBZR will, over its life, be used by more Vista computers than XP ones.  In
short, Vista has changed a number of shell related interfaces, and while
TSVN is slowly catching up (http://tortoisesvn.net/vistaproblems) they are a

XP has IColumnProvider (as implemented by Tortoise), but Vista changes this
model.  The new model is based around "file types" (eg, .jpg files) and it
appears each file type can only have 1 provider!  TSVN also seems to think
the Vista model isn't going to work (see previous link).  It's not clear how
much effort we should expend on a column system that has already been
abandoned by MS.  I would argue we spend effort on other parts of the system
(ie, the external GUI apps themselves, etc) and see if a path forward does
emerge for Vista.  Re can re-evaluate this based on user feedback and more
information about features of the Vista property system.
Implementation plan:

* Design the RPC mechanism used for icon overlays (ie, binary format used
for communication)

* Create Python prototype of the C++ "shim": modify the existing TBZR Python
code so that all references to "bzrlib" are removed.  Implement the client
side of the RPC mechanism and implement icon overlays using this RPC

* Create initial implementation of RPC server in Python.  This will use
bzrlib, but will also maintain a local cache to achieve the required
performance.  The initial implementation may even be single-threaded, just
to keep synchronization issues to a minimum.

* Analyze performance of prototype.  Verify that technique is feasible and
will offer reasonable performance and user experience.

* Implement C++ shim: replace the Python prototype with a light-weight C++
version.  We would work from the current TSVN sources, including its new
support for sharing icon overlays.  Advice on if we should "fork" TSVN, or
try and manage our own svn based branch in bazaar are invited.
* Implement property pages and context menus in C++.  Expand RPC server as

* Create binary for alpha releases, then go round-and-round until its baked

More information about the bazaar mailing list