[MERGE] WSGI backend for HTTP smart server, and deployment documentation

John Arbash Meinel john at arbash-meinel.com
Sun Oct 8 08:18:20 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> The branch at http://people.ubuntu.com/~andrew/bzr/wsgi-smart-server/ implements
> a WSGI backend for the HTTP smart server, and adds some documentation
> demonstrating how to deploy it with Apache and mod_fastcgi.
> 
> The branch includes the changes in my HTTP smart server branch
> http://people.ubuntu.com/~andrew/bzr/http-smart-server/.  I've attached a diff
> relative to that branch.
> 
> It currently has a serious security issue: it will allow access to files outside
> of the directory it serves.  I've put a warning in the documentation about this,
> but I intend to fix this fairly soon.
> 
> Please review and give feedback -- I'm particularly interested in ways to
> further simplify deployment, and to make sure the documentation is clear.  I'm
> no Apache guru, so please let me know if you think there's a better way to
> intercept requests for .bzr/smart URLs and hand them to bzr via FastCGI or
> whatever.  Also let me know if you have thoughts on the glue script.
> 
> -Andrew.
> 


> +
> +from cStringIO import StringIO
> +
> +from bzrlib import tests
> +from bzrlib.transport.http import wsgi
> +from bzrlib.transport import memory
> +

- - Extra whitespace here

> +class TestWSGI(tests.TestCase):
> +
> +    def setUp(self):
> +        tests.TestCase.setUp(self)
> +        self.status = None
> +        self.headers = None
> +
> +    def build_environ(self, **kw):
> +        """Builds an environ dict with all fields required by PEP 333.
> +        
> +        The resulting environ dict will be updated with an **kw that are passed.
> +        """
> +        environ = {
> +            # Required CGI variables
> +            'REQUEST_METHOD': 'GET',
> +            'SCRIPT_NAME': '/script/name/',
> +            'PATH_INFO': 'path/info',
> +            'SERVER_NAME': 'test',
> +            'SERVER_PORT': '9999',
> +            'SERVER_PROTOCOL': 'HTTP/1.0',
> +
> +            # Required WSGI variables
> +            'wsgi.version': (1,0),
> +            'wsgi.url_scheme': 'http',
> +            'wsgi.input': StringIO(''),
> +            'wsgi.errors': StringIO(),
> +            'wsgi.multithread': False,
> +            'wsgi.multiprocess': False,
> +            'wsgi.run_once': True,
> +        }
> +        environ.update(kw)
> +        return environ
> +        
> +    def read_response(self, iterable):
> +        response = ''
> +        for string in iterable:
> +            response += string
> +        return response
> +
> +    def start_response(self, status, headers):


...

> +    def test_smart_wsgi_app_uses_given_relpath(self):
> +        # The SmartWSGIApp should use the "bzrlib.relpath" field from the
> +        # WSGI environ to construct the transport for this request, by cloning
> +        # its base transport with the given relpath.
> +        transport = FakeTransport()
> +        wsgi_app = wsgi.SmartWSGIApp(transport)
> +        def make_request(transport, write_func):
> +            request = FakeRequest(transport, write_func)
> +            self.request = request
> +            return request
> +        wsgi_app.make_request = make_request

v- Wouldn't this be better by passing in the parameters to build_environ?

> +        fake_input = StringIO('fake request')
> +        environ = self.build_environ()
> +        environ.update({
> +            'REQUEST_METHOD': 'POST',
> +            'CONTENT_LENGTH': len(fake_input.getvalue()),
> +            'wsgi.input': fake_input,
> +            'bzrlib.relpath': 'foo/bar',
> +        })
> +        iterable = wsgi_app(environ, self.start_response)
> +        response = self.read_response(iterable)
> +        self.assertEqual([('clone', 'foo/bar')] , transport.calls)
> +
> +    def test_smart_wsgi_app_request_and_response(self):
> +        # SmartWSGIApp reads the smart request from the 'wsgi.input' file-like
> +        # object in the environ dict, and returns the response via the iterable
> +        # returned to the WSGI handler.
> +        transport = memory.MemoryTransport()
> +        transport.put_bytes('foo', 'some bytes')
> +        wsgi_app = wsgi.SmartWSGIApp(transport)
> +        def make_request(transport, write_func):
> +            request = FakeRequest(transport, write_func)
> +            self.request = request
> +            return request

v- Same here:

> +        wsgi_app.make_request = make_request
> +        fake_input = StringIO('fake request')
> +        environ = self.build_environ()
> +        environ.update({
> +            'REQUEST_METHOD': 'POST',
> +            'CONTENT_LENGTH': len(fake_input.getvalue()),
> +            'wsgi.input': fake_input,
> +            'bzrlib.relpath': 'foo',
> +        })
> +        iterable = wsgi_app(environ, self.start_response)
> +        response = self.read_response(iterable)
> +        self.assertEqual('200 OK', self.status)
> +        self.assertEqual('got bytes: fake request', response)
> +

...

> +    def test_relpath_setter_bad_path(self):
> +        # wsgi.RelpathSetter will reject paths with that don't match the prefix
> +        # or suffix with a 404.  This is probably a sign of misconfiguration; a
> +        # server shouldn't ever be invoking our WSGI application with bad paths.
> +        def fake_app(environ, start_response):
> +            self.fail('The app should never be called when the path is wrong')
> +        wrapped_app = wsgi.RelpathSetter(
> +            fake_app, prefix='/abc/', path_var='FOO')
> +        iterable = wrapped_app(
> +            {'FOO': 'AAA/abc/xyz/.bzr/smart'}, self.start_response)
> +        self.read_response(iterable)
> +        self.assertTrue(self.status.startswith('404'))

^- You are testing the prefix, but your comment says that the suffix
should also be checked.

Also, have you tested any of this on Windows?

Wouldn't it be nicer to allow a plain get to return some basic 'This is
meant to be for bzr' sort of page? Rather than strictly requiring POST only?

> +    def __call__(self, environ, start_response):
> +        """WSGI application callable."""
> +        if environ['REQUEST_METHOD'] != 'POST':
> +            start_response('405 Method not allowed', [('Allow', 'POST')])
> +            return []
> +
> +        relpath = environ['bzrlib.relpath']
> +        transport = self.backing_transport.clone(relpath)
> +        #assert transport.base.startswith(self.backing_transport.base)
> +        out_buffer = StringIO()
> +        smart_protocol_request = self.make_request(transport, out_buffer.write)
> +        request_data_length = int(environ['CONTENT_LENGTH'])
> +        request_data_bytes = environ['wsgi.input'].read(request_data_length)
> +        smart_protocol_request.accept_bytes(request_data_bytes)

v- This seems like a bad request, which should have a custom handler and
exception, not a 'assert' failure.
'assert' is good for testing that a developer is using your api
properly. But is very bad for sanitizing input/output to a user.
(For example, python -O means you don't process asserts, which is *very*
bad when sanitizing input and output).

> +        assert smart_protocol_request.next_read_size() == 0, (
> +            "not finished reading, but all data sent to protocol.")
> +        response_data = out_buffer.getvalue()
> +        headers = [('Content-type', 'application/octet-stream')]
> +        headers.append(("Content-Length", str(len(response_data))))
> +        start_response('200 OK', headers)
> +        return [response_data]
> +
> +    def make_request(self, transport, write_func):
> +        return smart.SmartServerRequestProtocolOne(transport, write_func)
> 
...

> +We need to change it to handle all requests for URLs ending in `.bzr/smart`.  It
> +will look like::
> +
> +    Alias /code /srv/example.com/www/code
> +    <Directory /srv/example.com/www/code>
> +        Options Indexes, FollowSymLinks
> +        RewriteEngine On
> +        RewriteBase /code
> +        RewriteRule ^(.*)/\.bzr/smart$ /srv/example.com/scripts/bzr-smart.fcgi
> +    </Directory>
> +

v- Could you comment why you are aliasing something to itself?

> +    Alias /srv/example.com/scripts/bzr-smart.fcgi /srv/example.com/scripts/bzr-smart.fcgi
> +    <Directory /srv/example.com/scripts>
> +        Options ExecCGI
> +        <Files bzr-smart.fcgi>
> +            SetHandler fastcgi-script
> +        </Files>
> +    </Directory>
> +    
> +This instructs Apache to hand requests for any URL ending with `/.bzr/smart`
> +inside `/code` to a Bazaar smart server via FastCGI.
> +
> +Refer to the mod_rewrite_ and mod_fastcgi_ documentation for further
> +information.
> +
> +.. _mod_rewrite: http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
> +.. _mod_fastcgi: http://www.fastcgi.com/mod_fastcgi/docs/mod_fastcgi.html
> +

...

v- I assume you are just asking for comments, but BRANCH.TODO should be
empty before merging.


> === modified file 'BRANCH.TODO'
> --- BRANCH.TODO	2006-10-04 02:24:48 +0000
> +++ BRANCH.TODO	2006-10-06 07:30:21 +0000
> @@ -1,3 +1,10 @@
>  # This file is for listing TODOs for branches that are being worked on.
>  # It should ALWAYS be empty in the mainline or in integration branches.
>  # 
> +
> +Security: it should be impossible, by default, to access files above the base of
> +the backing transport of the SmartServerRequestHandler.  Currently '..' and the
> +like are not vetted, however.
> +
> +Similarly, the SmartWSGIApp should also be careful to disallow '..' and the
> +like.
> 

Overall, it seems pretty simple. I didn't realize WSGI was this simple.
It seems like a good idea to get this into core, since we certainly
don't want to be importing a new copy of bzrlib with each call to a webpage.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFKKY8JdeBCYSNAAMRAkgjAKDGwT3JXMnIKZa40HDhfS6UPXUPyQCgxBOn
As9Kl6MSmPgsu8aypN/G46E=
=stDk
-----END PGP SIGNATURE-----





More information about the bazaar mailing list