[MERGE] HTTP smart server

Wed Oct 11 05:18:12 BST 2006

John Arbash Meinel wrote:
> Andrew Bennetts wrote:
> 
> ...
> 
> >> v- It seems that ReadingCompleted, WritingCompleted, and
> >> WritingNotCompleted might be reasonable to encapsulate in a heirarchy,
[...]
> > 
> > I think they're programming bugs, not errors that users should ever see.
> > 
> > I guess there might be a small readability advantage to putting these in a
> > hierarchy, to make it clearer that they're related.  It feels a bit odd, because
> > I can't think of any reason why you'd want to write "except
> > MediumRequestMisused:" (or whatever it would be called).  So I'm leaning towards
> > leaving it as is, on the basis that anything further would be a YAGNI.
> 
> Fair enough. As long as you have is_user_error = False, so that they
> create a traceback rather than suppressing it.

Ok.  I've added is_user_error = False to them.

> >> v- Your default '_expect_body_tail' is None, but then later on you use a
> >> plain:
> >>
> >>   while not self.received_bytes.endswith(self._expect_body_tail):
> >>
> >> So if _expect_body_tail is None, this will fail with a weird error.
> >>
> >> This seems to mean that _expect_body_tail should either be strictly
> >> required, or default to '' or some other string object.
> >>
> >> It might even be reasonable to have 'assert isinstance(expect_body_tail,
> >> str)', since you explicitly expect that later...
> > 
> > Ideally, I'd make expect_body_tail a required keyword arg, but there's no syntax
> > for that in Python, so I settled for just making it a keyword argument with an
> > invalid default (None).  (I prefer a keyword arg here because it's a test helper
> > specific to this file, so casual readers are highly unlikely to be familiar with
> > it, so having the "expect_body_tail" param repeated in every test that uses this
> > helper makes the tests more readable).
> > 
> > I'm not sure that the assertion would really buy us much.  "TypeError: expected
> > a character buffer object" isn't great, but it's still clear enough.  "assert
> > isinstance" makes me uncomfortable in Python; it's generally better not to fight
> > the duck typing.
> 
> Well, at least we need a comment indicating that we expect something to
> be passed. Otherwise default values tend to indicate that you can supply
> nothing.

I've added this docstring:

        """Constructor.

        :type expect_body_tail: str
        :param expect_body_tail: a reply won't be sent until this string is
            received.
        """

> >> 'accept_bytes()' really sounds to me like you are pulling new bytes to
> >> yourself, not pushing bytes to other.
> > 
> > In Twisted this would be dataReceived.  accept_bytes is a "push" interface, it
> > is called when there are bytes to process (i.e. it doesn't "pull", by calling
> > read or whatever itself).  So e.g. _serve_one_request_unguarded will call
> > request.accept_bytes as it pulls them off the connection, and the request will
> > react according.  What would you call this?
> 
> I'm a little hesitant, mostly because both the client and the server
> call the same function to mean different things. (On the client it means
> push these bytes to the server, and on the server it means handle these
> bytes from the client.)
> 
> I understand your point, 'process_bytes()' was the best I could think of.

Well, on the client medium it means "here's some bytes to deal with" and on the
server medium it means "here's some bytes to deal with".  What "deal with" means
varies, of course (deliver to the server via a socket/http/whatever, give to a
request object to react to appropriate, etc).

I guess process_bytes is a slight improvement.  Should I just globally rename
accept_bytes to process_bytes everywhere?

> > [...]
> >>> +    def test_bulk_data_pycurl(self):
> >>> +        self._test_bulk_data('http+pycurl')
> >>> +    
> >>> +    def test_bulk_data_urllib(self):
> >>> +        self._test_bulk_data('http+urllib')
> >> ^- what if pycurl is not installed?
> > 
> > Then it probably goes "boom!" ;)
> > 
> > Fixed to raise TestSkipped.
> > 
> >> Why does Post need its own curl object? Vincent Ladeuil had worked out
> >> how to get _base and _range to share a curl object (just manually set
> >> the Ranges header).
> > 
> > Those changes hadn't (haven't?) landed at the time I wrote this code, so I
> > followed the existing idiom.  I'm not sure what's involved in sharing curl
> > objects.
> 
> ^- we only used separate request objects because there is no way to
> unset the pycurl.RANGE header. (It is a bug in pycurl, but one we need
> to work around anyway).
> 
> I would rather see us reuse the POST object until we find that it isn't
> possible.

It is reusing the POST object, exactly the same was way as the _base_curl and
_range_curl objects are reused.  Did you mean something else?

> ...
> 
> I think you have addressed most of my concerns. I'm not strict on not
> using a new curl object, especially since I'd like to see us switch to
> urllib. :)

Sure :)

Let me know about the curl object and process_bytes issues.

-Andrew.