Rev 3446: Update TBZR implementation plan in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Wed May 21 16:45:37 BST 2008
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 3446
revision-id:pqm at pqm.ubuntu.com-20080521154523-4hv5qe8a9drke2h1
parent: pqm at pqm.ubuntu.com-20080521104134-beoquporep2cpghs
parent: ian.clatworthy at canonical.com-20080521063542-10rttegjbb1xy6lr
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Wed 2008-05-21 16:45:23 +0100
message:
Update TBZR implementation plan
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
doc/developers/tortoise-strategy.txt tortoisestrategy.txt-20080403024510-2ahdqrvnwqrb5p5t-1
------------------------------------------------------------
revno: 3441.1.1
revision-id:ian.clatworthy at canonical.com-20080521063542-10rttegjbb1xy6lr
parent: pqm at pqm.ubuntu.com-20080520210027-wetfxldz1ggc5u2a
parent: ian.clatworthy at canonical.com-20080521063335-ynzwbifdt1xo0fz4
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: ianc-integration
timestamp: Wed 2008-05-21 16:35:42 +1000
message:
Update TBZR implementation plan
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
doc/developers/tortoise-strategy.txt tortoisestrategy.txt-20080403024510-2ahdqrvnwqrb5p5t-1
------------------------------------------------------------
revno: 3441.2.1
revision-id:ian.clatworthy at canonical.com-20080521063335-ynzwbifdt1xo0fz4
parent: pqm at pqm.ubuntu.com-20080520210027-wetfxldz1ggc5u2a
parent: mhammond at skippinet.com.au-20080502041330-pga0qz2hkbop229d
committer: Ian Clatworthy <ian.clatworthy at canonical.com>
branch nick: bzr.tortoise
timestamp: Wed 2008-05-21 16:33:35 +1000
message:
added ianc tweaks to tbzr strategy doc
modified:
doc/developers/tortoise-strategy.txt tortoisestrategy.txt-20080403024510-2ahdqrvnwqrb5p5t-1
------------------------------------------------------------
revno: 3394.4.2
revision-id:mhammond at skippinet.com.au-20080502041330-pga0qz2hkbop229d
parent: mhammond at skippinet.com.au-20080502040255-33gphyioiis7a6a2
committer: Mark Hammond <mhammond at skippinet.com.au>
branch nick: bzr.work2
timestamp: Fri 2008-05-02 14:13:30 +1000
message:
Fix bad edit.
modified:
doc/developers/tortoise-strategy.txt tortoisestrategy.txt-20080403024510-2ahdqrvnwqrb5p5t-1
------------------------------------------------------------
revno: 3394.4.1
revision-id:mhammond at skippinet.com.au-20080502040255-33gphyioiis7a6a2
parent: pqm at pqm.ubuntu.com-20080430182230-iz5tit0t2ut5clww
committer: Mark Hammond <mhammond at skippinet.com.au>
branch nick: bzr.work2
timestamp: Fri 2008-05-02 14:02:55 +1000
message:
* Update TSVN notes based on comments from Stefan Kung
* Update implementation plan including details notes about
potentially reusing TSVNCache
modified:
doc/developers/tortoise-strategy.txt tortoisestrategy.txt-20080403024510-2ahdqrvnwqrb5p5t-1
=== modified file 'NEWS'
--- a/NEWS 2008-05-21 10:41:34 +0000
+++ b/NEWS 2008-05-21 15:45:23 +0000
@@ -71,6 +71,8 @@
on the plugin and integration chapters of the User Guide.
(Ian Clatworthy)
+ * Updated Tortise strategy document. (Mark Hammond)
+
TESTING:
* New helper function for splitting test suites
=== modified file 'doc/developers/tortoise-strategy.txt'
--- a/doc/developers/tortoise-strategy.txt 2008-04-09 08:19:38 +0000
+++ b/doc/developers/tortoise-strategy.txt 2008-05-21 06:33:35 +0000
@@ -1,19 +1,19 @@
Bazaar Windows Shell Extension Options
-========================================
+======================================
.. contents:: :local:
Introduction
------------
-This document details the imlpementation strategy chosen for the
+This document details the implementation strategy chosen for the
Bazaar Windows Shell Extensions, otherwise known as TortoiseBzr, or TBZR.
As justification for the strategy, it also describes the general architecture
of Windows Shell Extensions, then looks at the C++ implemented TortoiseSvn
and the Python implemented TortoiseBzr, and discusses alternative
implementation strategies, and the reasons they were not chosen.
-The following points summarize the strategy.
+The following points summarize the strategy:
* Main shell extension code will be implemented in C++, and be as thin as
possible. It will not directly do any VCS work, but instead will perform
@@ -41,7 +41,7 @@
loaded by the Windows shell. There is no facility for shell extensions to
exist in a separate process - DLLs are the only option, and they are loaded
into other processes which take advantage of the Windows shell (although
-obviously this DLL is free to do whatever it likes)
+obviously this DLL is free to do whatever it likes).
For the sake of this discussion, there are 2 categories of shell extensions:
@@ -66,8 +66,11 @@
example, when notepad.exe first starts with an empty file it is using around
3.5MB of RAM. As soon as the FileOpen dialog is loaded, TortoiseSvn loads
well over 20 additional DLLs, including the MSVC8 runtime, into the Notepad
-process causing its memory usage to more than double - all without doing
-anything tortoise specific at all.
+process causing its memory usage (as reported by task manager) to more than
+double - all without doing anything tortoise specific at all. (In fairness,
+this illustration is contrived - the code from these DLLs are already in
+memory and there is no reason to suggest TSVN adds any other unreasonable
+burden - but the general point remains valid.)
This has wide-ranging implications. It means that such shell extensions
should be developed using a tool which can never cause conflict with
@@ -78,26 +81,26 @@
conflict badly with other Python implemented applications (and will certainly
kill them in some situations). A similar issue exists with GUI toolkits used
- using (say) PyGTK directly in the shell extension would need to be avoided
-(which it currently is best I can tell). It should also be obvious the shell
-extension will be in many processes simultaneously, meaning use of a simple
-log-file etc is problematic.
+(which it currently is best I can tell). It should also be obvious that the
+shell extension will be in many processes simultaneously, meaning use of a
+simple log-file (for example) is problematic.
In practice, there is only 1 truly safe option - a low-level language (such
as C/C++) which makes use of only the win32 API, and a static version of the
-C runtime library if necessary. Obviously, this sucks from our POV :)
+C runtime library if necessary. Obviously, this sucks from our POV. :)
[1]: http://blogs.msdn.com/oldnewthing/archive/2006/12/18/1317290.aspx
Analysis of TortoiseSVN code
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TortoiseSVN is implemented in C++. It relies on an external process to
-perform most UI (such as diff, log, commit etc commands), but it appears to
+perform most UI (such as diff, log, commit etc.) commands, but it appears to
directly embed the SVN C libraries for the purposes of obtaining status for
icon overlays, context menu, drag&drop, etc.
The use of an external process to perform commands is fairly simplistic in
-terms of parent and modal windows - for example, when selecting "Commit", a
+terms of parent and modal windows. For example, when selecting "Commit", a
new process starts and *usually* ends up as the foreground window, but it may
occasionally be lost underneath the window which created it, and the user may
accidently start many processes when they only need 1. Best I can tell, this
@@ -108,25 +111,26 @@
directly needed by the shell are part of the "shell extension" and the rest
of TortoiseSvn is "just" a fairly large GUI application implementing many
commands. The command-line to the app has even been documented for people who
-wish to automate tasks using that GUI. This GUI appears to also be
-implemented in C++ using Windows resource files.
+wish to automate tasks using that GUI. This GUI is also implemented in C++
+using Windows resource files.
-TortoiseSvn appears to cache using a separate process, aptly named
-TSVNCache.exe. It uses a named pipe to accept connections from other
-processes for various operations. At this stage, it's still unclear exactly
-what is fetched from the cache and exactly what the shell extension fetches
-directly via the subversion C libraries.
+TortoiseSvn has an option (enabled by default) which enabled a cache using a
+separate process, aptly named TSVNCache.exe. It uses a named pipe to accept
+connections from other processes for various operations. When enabled, TSVN
+fetches most (all?) status information from this process, but it also has the
+option to talk directly to the VCS, along with options to disable functionality
+in various cases.
There doesn't seem to be a good story for logging or debugging - which is
-what you expect from C++ based apps :( Most of the heavy lifting is done by
+what you expect from C++ based apps. :( Most of the heavy lifting is done by
the external application, which might offer better facilities.
-
+
Analysis of existing TortoiseBzr code
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The existing code is actually quite cool given its history (SoC student,
etc), so this should not be taken as criticism of the implementer nor of the
-implementation - indeed, many criticisms are also true of the TortoiseSvn
+implementation. Indeed, many criticisms are also true of the TortoiseSvn
implementation - see above. However, I have attempted to list the bad things
rather than the good things so a clear future strategy can be agreed, with
all limitations understood.
@@ -158,21 +162,21 @@
and also error prone (it's possible the editor will check the file in,
meaning Windows explorer will be showing stale data). This may be possible to
address via file-system notifications, but a shared cache would be preferred
-(although clearly more difficult to implement)
+(although clearly more difficult to implement).
One tortoise port recently announced a technique for all tortoise ports to
share the same icon overlays to help work around a limitation in Windows on
-the total number of overlays (its limited to 15, due to the number of bits
+the total number of overlays (it's limited to 15, due to the number of bits
reserved in a 32bit int for overlays). TBZR needs to take advantage of that
(but to be fair, this overlay sharing technique was probably done after the
-TBZR implementation)
+TBZR implementation).
The current code appears to recursively walk a tree to check if *any* file in
the tree has changed, so it can reflect this in the parent directory status.
This is almost certainly an evil thing to do (Shell Extensions are optimized
so that a folder doesn't even need to look in its direct children for another
folder, let alone recurse for any reason at all. It may be a network mounted
-drive that doesn't perform at all)
+drive that doesn't perform at all.)
Although somewhat dependent on bzr itself, we need a strategy for binary
releases (ie, it assumes python.exe, etc) and integration into an existing
@@ -182,7 +186,7 @@
inexperienced with the language.
Detailed Implementation Strategy
----------------------------------
+--------------------------------
We will create a hybrid Python and C++ implementation. In this model, we
would still use something like TSVNCache.exe (this external
@@ -194,7 +198,7 @@
A pragmatic implementation strategy will be used to work towards the above
infrastructure - we will keep the shell extension implemented in Python - but
-without using bzrlib. This would allow us to focus on this
+without using bzrlib. This allows us to focus on this
shared-cache/remote-process infrastructure without immediately
re-implementing a shell extension in C++. Longer term, once the
infrastructure is in place and as optimized as possible, we can move to C++
@@ -202,7 +206,7 @@
share as much code as possible from TortoiseSvn, including overlay handlers.
External Command Processor
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~
The external command application (ie, the app invoked by the shell extension
to perform commands) can remain as-is, and remain a "shell" for other
@@ -215,7 +219,7 @@
extension.
Performance considerations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~
As discussed above, the model used by Tortoise is that most "interesting"
things are done by external applications. Most Tortoise implementations
@@ -228,7 +232,7 @@
be built on that.
There are 2 aspects of the shell integration which are performance critical -
-the "icon overlays" and "column providers"
+the "icon overlays" and "column providers".
The short-story with Icon Overlays is that we need to register 12 global
"overlay providers" - one for each state we show. Each provider is called for
@@ -259,7 +263,7 @@
people, for example.
RPC options
-~~~~~~~~~~~~
+~~~~~~~~~~~
Due to the high number of calls for icon overlays, the RPC overhead must be
kept as low as possible. Due to the client side being implemented in C++,
@@ -282,7 +286,7 @@
appear sufficient to implement a prototype.
Vista versus XP
-~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~
Let's try and avoid an OS advocacy debate :) But it is probably true that
TBZR will, over its life, be used by more Vista computers than XP ones. In
@@ -298,14 +302,116 @@
external GUI apps themselves, etc) and see if a path forward does emerge for
Vista. We can re-evaluate this based on user feedback and more information
about features of the Vista property system.
-
-Implementation plan:
---------------------
+
+Reuse of TSVNCache?
+~~~~~~~~~~~~~~~~~~~
+
+The RPC mechanism and the tasks performed by the RPC server (rpc, file system
+crawling and watching, device notifications, caching) are very similar to
+those already implemented for TSVN and analysis of that code shows that
+it is not particularly tied to any VCS model. As a result, consideration
+should be given to making the best use of this existing debugged and
+optimized technology.
+
+Discussions with the TSVN developers have indicated that they would prefer us
+to fork their code rather than introduce complexity and instability into
+their code by attempting to share it. See the follow-ups to
+http://thread.gmane.org/gmane.comp.version-control.subversion.tortoisesvn.devel/32635/focus=32651
+for details.
+
+For background, the TSVNCache process is fairly sophisticated - but mainly in
+areas not related to source control. It has had various performance tweaks
+and is smart in terms of minimizing its use of resources when possible. The
+'cloc' utility counts ~5000 lines of C++ code and weighs in just under 200KB
+on disk (not including headers), so this is not a trivial application.
+However, the code that is of most interest (the crawlers, watchers and cache)
+are roughly ~2500 lines of C++. Most of the source files only depend lightly
+on SVN specifics, so it would not be a huge job to make the existing code
+talk to Bazaar. The code is thread-safe, but not particularly thread-friendly
+(ie, fairly coarse-grained locks are taken in most cases).
+
+In practice, this give us 2 options - "fork" or "port":
+
+* Fork the existing C++ code, replacing the existing source-control code with
+ code that talks to Bazaar. This would involve introducing a Python layer,
+ but only at the layers where we need to talk to bzrlib. The bulk of the
+ code would remain in C++.
+
+ This would have the following benefits:
+
+ - May offer significant performance advantages in some cases (eg, a
+ cache-hit would never enter Python at all.)
+
+ - Quickest time to a prototype working - the existing working code can be
+ used quicker.
+
+ And the following drawbacks:
+
+ - More complex to develop. People wishing to hack on it must be on Windows,
+ know C++ and own the most recent MSVC8.
+
+ - More complex to build and package: people making binaries must be on
+ Windows and have the most recent MSVC8.
+
+ - Is tied to Windows - it would be impractical for this to be
+ cross-platform, even just for test purposes (although parts of it
+ obviously could).
+
+* Port the existing C++ code to Python. We would do this almost
+ "line-for-line", and attempt to keep many optimizations in place (or at
+ least document what the optimizations were for ones we consider dubious).
+ For the windows versions, pywin32 and ctypes would be leaned on - there
+ would be no C++ at all.
+
+ This would have the following benefits:
+
+ - Only need Python and Python skills to hack on it.
+
+ - No C++ compiler needed means easier to cut releases
+
+ - Python makes it easier to understand and maintain - it should appear much
+ less complex than the C++ version.
+
+ And the following drawbacks:
+
+ - Will be slower in some cases - eg, a cache-hit will involve executing
+ Python code.
+
+ - Will take longer to get a minimal system working. In practice this
+ probably means the initial versions will not be as sophisticated.
+
+Given the above, there are two issues which prevent Python being the clear
+winner: (1) will it perform OK? (2) How much longer to a prototype?
+
+My gut feeling on (1) is that it will perform fine, given a suitable Python
+implementation. For example, Python code that simply looked up a dictionary
+would be fast enough - so it all depends on how fast we can make our cache.
+Re (2), it should be possible to have a "stub" process (did almost nothing in
+terms of caching or crawling, but could be connected to by the shell) in a 8
+hours, and some crawling and caching in 40. Note that this is separate from
+the work included for the shell extension itself (the implementation of which
+is largely independent of the TBZRCache implementation). So given the lack of
+a deadline for any particular feature and the better long-term fit of using
+Python, the conclusion is that we should "port" TSVN for bazaar.
+
+Reuse of this code by Mercurial or other Python based VCS systems?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Incidentally, the hope is that this work can be picked up by the Mercurial
+project (or anyone else who thinks it is of use). However, we will limit
+ourselves to attempting to find a clean abstraction for the parts that talk
+to the VCS (as good design would dictate regardless) and then try and assist
+other projects in providing patches which work for both of us. In other
+words, supporting multiple VCS systems is not an explicit goal at this stage,
+but we would hope it is possible in the future.
+
+Implementation plan
+-------------------
The following is a high-level set of milestones for the implementation:
* Design the RPC mechanism used for icon overlays (ie, binary format used for
- communication)
+ communication).
* Create Python prototype of the C++ "shim": modify the existing TBZR Python
code so that all references to "bzrlib" are removed. Implement the client
@@ -314,16 +420,19 @@
* Create initial implementation of RPC server in Python. This will use
bzrlib, but will also maintain a local cache to achieve the required
- performance. The initial implementation may even be single-threaded, just
- to keep synchronization issues to a minimum.
+ performance. File crawling and watching will not be implemented at this
+ stage, but caching will (although cache persistence might be skipped).
* Analyze performance of prototype. Verify that technique is feasible and
will offer reasonable performance and user experience.
+* Implement file watching, crawling etc by "porting" TSVNCache code to
+ Python, as described above.
+
* Implement C++ shim: replace the Python prototype with a light-weight C++
- version. We would work from the current TSVN sources, including its new
- support for sharing icon overlays. Advice on if we should "fork" TSVN, or
- try and manage our own svn based branch in bazaar are invited.
+ version. We will fork the current TSVN sources, including its new
+ support for sharing icon overlays (although advice on how to setup this
+ fork is needed!)
* Implement property pages and context menus in C++. Expand RPC server as
necessary.
@@ -346,7 +455,7 @@
into various processes.
Although implementation simplicity is a key benefit to this option, it was
-not chosen for various reasons; The use of Python means that there is a
+not chosen for various reasons, e.g. the use of Python means that there is a
larger chance of conflicting with existing applications, or even existing
Python implemented shell extensions. It will also increase the memory usage
of all applications which use the shell. While this may create problems for a
More information about the bazaar-commits
mailing list