Rev 2440: NEWS entry, greatly improved docstring in bzrlib.smart. in http://bazaar.launchpad.net/~bzr/bzr/hpss-protocol2

Wed Apr 25 06:43:45 BST 2007

At http://bazaar.launchpad.net/~bzr/bzr/hpss-protocol2

------------------------------------------------------------
revno: 2440
revision-id: andrew.bennetts at canonical.com-20070425054002-08sj7lxphtpb6ewm
parent: andrew.bennetts at canonical.com-20070425045619-gfsty2ebwx3c5rka
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: hpss-protocol2
timestamp: Wed 2007-04-25 15:40:02 +1000
message:
  NEWS entry, greatly improved docstring in bzrlib.smart.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/smart/__init__.py       __init__.py-20061101100249-8jwwl0d3jr080zim-1
  bzrlib/smart/medium.py         medium.py-20061103051856-rgu2huy59fkz902q-1
=== modified file 'NEWS'

--- a/NEWS	2007-04-23 07:50:15 +0000
+++ b/NEWS	2007-04-25 05:40:02 +0000
@@ -55,6 +55,10 @@
       serve out to the local LAN (and anyone in the world that can reach the
       machine running ``bzr serve``. (Robert Collins, #98918)
 
+    * A new smart server protocol version has been added.  It prefixes requests
+      and responses with an explicit version identifier so that future protocol
+      revisions can be dealt with gracefully.  (Andrew Bennetts, Robert Collins)
+
   INTERNALS:
 
     * bzrlib API compatability with 0.8 has been dropped, cleaning up some

=== modified file 'bzrlib/smart/__init__.py'
--- a/bzrlib/smart/__init__.py	2007-04-10 15:54:15 +0000
+++ b/bzrlib/smart/__init__.py	2007-04-25 05:40:02 +0000
@@ -23,50 +23,78 @@
 Overview
 ========
 
-Requests are sent as a command and list of arguments, followed by optional
-bulk body data.  Responses are similarly a response and list of arguments,
-followed by bulk body data. ::
-
-  SEP := '\001'
-    Fields are separated by Ctrl-A.
-  BULK_DATA := CHUNK TRAILER
-    Chunks can be repeated as many times as necessary.
-  CHUNK := CHUNK_LEN CHUNK_BODY
-  CHUNK_LEN := DIGIT+ NEWLINE
-    Gives the number of bytes in the following chunk.
-  CHUNK_BODY := BYTE[chunk_len]
-  TRAILER := SUCCESS_TRAILER | ERROR_TRAILER
-  SUCCESS_TRAILER := 'done' NEWLINE
-  ERROR_TRAILER := 
-
-Paths are passed across the network.  The client needs to see a namespace that
-includes any repository that might need to be referenced, and the client needs
-to know about a root directory beyond which it cannot ascend.
-
-Servers run over ssh will typically want to be able to access any path the user 
-can access.  Public servers on the other hand (which might be over http, ssh
-or tcp) will typically want to restrict access to only a particular directory 
-and its children, so will want to do a software virtual root at that level.
-In other words they'll want to rewrite incoming paths to be under that level
-(and prevent escaping using ../ tricks.)
-
-URLs that include ~ should probably be passed across to the server verbatim
-and the server can expand them.  This will proably not be meaningful when 
-limited to a directory?
-
-At the bottom level socket, pipes, HTTP server.  For sockets, we have the idea
-that you have multiple requests and get a read error because the other side did
-shutdown.  For pipes we have read pipe which will have a zero read which marks
-end-of-file.  For HTTP server environment there is not end-of-stream because
-each request coming into the server is independent.
+The smart protocol provides a way to send a requests and corresponding
+responses to communicate with a remote bzr process.
+
+Layering
+========
+
+Medium
+------
+
+At the bottom level there is either a socket, pipes, or an HTTP
+request/response.  We call this layer the *medium*.  It is responsible for
+carrying bytes between a client and server.  For sockets, we have the
+idea that you have multiple requests and get a read error because the other side
+did shutdown.  For pipes we have read pipe which will have a zero read which
+marks end-of-file.  For HTTP server environment there is no end-of-stream
+because each request coming into the server is independent.
 
 So we need a wrapper around pipes and sockets to seperate out requests from
-substrate and this will give us a single model which is consist for HTTP,
+substrate and this will give us a single model which is consistent for HTTP,
 sockets and pipes.
 
+Protocol
+--------
+
+On top of the medium is the *protocol*.  This is the layer that deserialises
+bytes into the structured data that requests and responses consist of.
+
+Version one of the protocol (for requests and responses) is described by::
+
+  REQUEST := MESSAGE_V1
+  RESPONSE := MESSAGE_V1
+  MESSAGE_V1 := ARGS BODY
+
+  ARGS := ARG [MORE_ARGS] NEWLINE
+  MORE_ARGS := SEP ARG [MORE_ARGS]
+  SEP := 0x01
+
+  BODY := LENGTH NEWLINE BODY_BYTES TRAILER
+  LENGTH := decimal integer
+  TRAILER := "done" NEWLINE
+
+That is, a tuple of arguments separated by Ctrl-A and terminated with a newline,
+followed by length prefixed body with a constant trailer.  Note that although
+arguments are not 8-bit safe (they cannot include 0x01 or 0x0a bytes without
+breaking the protocol encoding), the body is.
+
+Version two of the request protocol is::
+
+  REQUEST_V2 := "bzr request 2" NEWLINE MESSAGE_V1
+
+Version two of the response protocol is::
+
+  RESPONSE_V2 := "bzr request 2" NEWLINE MESSAGE_V1
+
+Future versions should follow this structure, like version two does::
+
+  FUTURE_MESSAGE := VERSION_STRING NEWLINE REST_OF_MESSAGE
+
+This is that clients and servers can read bytes up to the first newline byte to
+determine what version a message is.
+
+Request/Response processing
+---------------------------
+
+On top of the protocol is the logic for processing requests (on the server) or
+responses (on the client).
+
 Server-side
 -----------
 
+Sketch::
+
  MEDIUM  (factory for protocol, reads bytes & pushes to protocol,
           uses protocol to detect end-of-request, sends written
           bytes to client) e.g. socket, pipe, HTTP request handler.
@@ -74,7 +102,7 @@
   | bytes.
   v
 
-PROTOCOL  (serialization, deserialization)  accepts bytes for one
+ PROTOCOL(serialization, deserialization)  accepts bytes for one
           request, decodes according to internal state, pushes
           structured data to handler.  accepts structured data from
           handler and encodes and writes to the medium.  factory for
@@ -83,23 +111,27 @@
   | structured data
   v
 
-HANDLER   (domain logic) accepts structured data, operates state
+ HANDLER  (domain logic) accepts structured data, operates state
           machine until the request can be satisfied,
           sends structured data to the protocol.
 
+Request handlers are registered in `bzrlib.smart.request`.
+
 
 Client-side
 -----------
 
- CLIENT             domain logic, accepts domain requests, generated structured
-                    data, reads structured data from responses and turns into
-                    domain data.  Sends structured data to the protocol.
-                    Operates state machines until the request can be delivered
-                    (e.g. reading from a bundle generated in bzrlib to deliver a
-                    complete request).
-
-                    Possibly this should just be RemoteBzrDir, RemoteTransport,
-                    ...
+Sketch::
+
+ CLIENT   domain logic, accepts domain requests, generated structured
+          data, reads structured data from responses and turns into
+          domain data.  Sends structured data to the protocol.
+          Operates state machines until the request can be delivered
+          (e.g. reading from a bundle generated in bzrlib to deliver a
+          complete request).
+
+          Possibly this should just be RemoteBzrDir, RemoteTransport,
+          ...
   ^
   | structured data
   v
@@ -113,6 +145,30 @@
 
  MEDIUM  (accepts bytes from the protocol & delivers to the remote server.
           Allows the potocol to read bytes e.g. socket, pipe, HTTP request.
+
+The domain logic is in `bzrlib.remote`: `RemoteBzrDir`, `RemoteBranch`, and so
+on.
+
+There is also an plain file-level transport that calls remote methods to
+manipulate files on the server in `bzrlib.transport.remote`.
+
+Paths
+=====
+
+Paths are passed across the network.  The client needs to see a namespace that
+includes any repository that might need to be referenced, and the client needs
+to know about a root directory beyond which it cannot ascend.
+
+Servers run over ssh will typically want to be able to access any path the user
+can access.  Public servers on the other hand (which might be over http, ssh
+or tcp) will typically want to restrict access to only a particular directory
+and its children, so will want to do a software virtual root at that level.
+In other words they'll want to rewrite incoming paths to be under that level
+(and prevent escaping using ../ tricks.)
+
+URLs that include ~ should probably be passed across to the server verbatim
+and the server can expand them.  This will proably not be meaningful when
+limited to a directory?
 """
 
 # TODO: _translate_error should be on the client, not the transport because
@@ -127,8 +183,6 @@
 # consider how we'll handle error reporting, e.g. if we get halfway through a
 # bulk transfer and then something goes wrong.
 
-# TODO: Standard marker at start of request/response lines?
-
 # TODO: Make each request and response self-validatable, e.g. with checksums.
 #
 # TODO: get/put objects could be changed to gradually read back the data as it
@@ -155,8 +209,6 @@
 # connection?  Perhaps all Transports should factor out a common connection
 # from the thing that has the directory context?
 #
-# TODO: Pull more things common to sftp and ssh to a higher level.
-#
 # TODO: The server that manages a connection should be quite small and retain
 # minimum state because each of the requests are supposed to be stateless.
 # Then we can write another implementation that maps to http.
@@ -179,26 +231,10 @@
 # urlescape them instead.  Indeed possibly this should just literally be
 # http-over-ssh.
 #
-# FIXME: This transport, with several others, has imperfect handling of paths
-# within urls.  It'd probably be better for ".." from a root to raise an error
-# rather than return the same directory as we do at present.
-#
-# TODO: Rather than working at the Transport layer we want a Branch,
-# Repository or BzrDir objects that talk to a server.
-#
 # TODO: Probably want some way for server commands to gradually produce body
 # data rather than passing it as a string; they could perhaps pass an
 # iterator-like callback that will gradually yield data; it probably needs a
 # close() method that will always be closed to do any necessary cleanup.
-#
-# TODO: Split the actual smart server from the ssh encoding of it.
-#
-# TODO: Perhaps support file-level readwrite operations over the transport
-# too.
-#
-# TODO: SmartBzrDir class, proxying all Branch etc methods across to another
-# branch doing file-level operations.
-#
 
 
 # Promote some attributes from submodules into this namespace

=== modified file 'bzrlib/smart/medium.py'
--- a/bzrlib/smart/medium.py	2007-04-25 04:56:19 +0000
+++ b/bzrlib/smart/medium.py	2007-04-25 05:40:02 +0000
@@ -78,6 +78,14 @@
             raise
 
     def _build_protocol(self):
+        """Identifies the version of the incoming request, and returns an
+        a protocol object that can interpret it.
+
+        If more bytes than the version prefix of the request are read, they will
+        be fed into the protocol before it is returned.
+
+        :returns: a SmartServerRequestProtocol.
+        """
         # Identify the protocol version.
         bytes = self._get_line()
         if bytes.startswith(REQUEST_VERSION_TWO):