[MERGE] WSGI backend for HTTP smart server, and deployment documentation

Wed Oct 11 07:50:20 BST 2006

John Arbash Meinel wrote:
> Andrew Bennetts wrote:

> > +from bzrlib import tests
> > +from bzrlib.transport.http import wsgi
> > +from bzrlib.transport import memory
> > +
> 
> - - Extra whitespace here

Added, thanks.

[...]
> 
> v- Wouldn't this be better by passing in the parameters to build_environ?
> 
> > +        fake_input = StringIO('fake request')
> > +        environ = self.build_environ()
> > +        environ.update({
> > +            'REQUEST_METHOD': 'POST',
> > +            'CONTENT_LENGTH': len(fake_input.getvalue()),
> > +            'wsgi.input': fake_input,
> > +            'bzrlib.relpath': 'foo/bar',
> > +        })

I kept changing my mind about how I wanted build_environ to work.  It was
accepting **kw, but that doesn't make it convenient to set keys like
"wsgi.input".  I've changed it to just take a dict, so this code is now:

        fake_input = StringIO('fake request')
        environ = self.build_environ({
            'REQUEST_METHOD': 'POST',
            'CONTENT_LENGTH': len(fake_input.getvalue()),
            'wsgi.input': fake_input,
            'bzrlib.relpath': 'foo/bar',
        })

> v- Same here:
> 
> > +        wsgi_app.make_request = make_request
> > +        fake_input = StringIO('fake request')
> > +        environ = self.build_environ()
> > +        environ.update({

Done.

> > +    def test_relpath_setter_bad_path(self):
> > +        # wsgi.RelpathSetter will reject paths with that don't match the prefix
> > +        # or suffix with a 404.  This is probably a sign of misconfiguration; a
> > +        # server shouldn't ever be invoking our WSGI application with bad paths.
> > +        def fake_app(environ, start_response):
> > +            self.fail('The app should never be called when the path is wrong')
> > +        wrapped_app = wsgi.RelpathSetter(
> > +            fake_app, prefix='/abc/', path_var='FOO')
> > +        iterable = wrapped_app(
> > +            {'FOO': 'AAA/abc/xyz/.bzr/smart'}, self.start_response)
> > +        self.read_response(iterable)
> > +        self.assertTrue(self.status.startswith('404'))
> 
> ^- You are testing the prefix, but your comment says that the suffix
> should also be checked.

Good point.  I've renamed this test test_relpath_setter_bad_path_prefix, and
added a test_relpath_setter_bad_path_suffix (and adjusted the comments
appropriately).

> Also, have you tested any of this on Windows?

No, I haven't...

> Wouldn't it be nicer to allow a plain get to return some basic 'This is
> meant to be for bzr' sort of page? Rather than strictly requiring POST only?

Perhaps.  That page should still still have a status of "405 Method not
allowed", I'd think.

I'm not sure providing a helpful text/html page really matters though.  What's a
situation where this is likely to help?

> v- This seems like a bad request, which should have a custom handler and
> exception, not a 'assert' failure.
> 'assert' is good for testing that a developer is using your api
> properly. But is very bad for sanitizing input/output to a user.
> (For example, python -O means you don't process asserts, which is *very*
> bad when sanitizing input and output).
> 
> > +        assert smart_protocol_request.next_read_size() == 0, (
> > +            "not finished reading, but all data sent to protocol.")

Good point.  It now returns an error (and I've added a test for this situation).

> v- Could you comment why you are aliasing something to itself?
> 
> > +    Alias /srv/example.com/scripts/bzr-smart.fcgi /srv/example.com/scripts/bzr-smart.fcgi
> > +    <Directory /srv/example.com/scripts>

Because it doesn't work otherwise ;)

I've added this comment:

    # bzr-smart.fcgi isn't under the DocumentRoot, so Alias it into the URL
    # namespace so it can be executed.

I'm no Apache expert, so I'm not sure that this explanation is correct, and I'm
not sure there isn't a better way.

Still, it works, and we can improve it later if someone knows better.

> v- I assume you are just asking for comments, but BRANCH.TODO should be
> empty before merging.

Right.  In fact, I'd be hesistant to merge this before these issues are solved,
because otherwise this feature is a serious security hazard.

[...]
> Overall, it seems pretty simple. I didn't realize WSGI was this simple.
> It seems like a good idea to get this into core, since we certainly
> don't want to be importing a new copy of bzrlib with each call to a webpage.

Yeah, the hardest part was working out what the configuration should look like.
Writing the code to support that was relatively easy.

-Andrew.