[MERGE] urllib keeps connections alive

Sun Oct 8 01:47:41 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> Hi all,
> 
> This is a new implementation of bzrlib/transport/http/_urllib.py
> but most of the work is done in the new file
> bzrlib/transport/http/_urllib2_wrappers.py.
> 
> On top of keeping connections alive, it also attempt one retry of
> the request when errors occur, so that should provide a more
> robust behavior in case of transient errors at the socket level.
> 
> The implementation is on par with the actual _urllib
> implementation in terms of functionalities, I'll continue to add
> tests before addressing new needs (better redirection and better
> authentification to begin with, but that's, indeed, also need
> tests).
> 
> I include a patch because the bundle looks too heavy for the list
> (~700KO), but a branch is available on launchpad:
> 
>   https://launchpad.net/people/bzr/+branch/bzr/bzr.urllib.keepalive
> 
> and the direct link to the branch:
> 
>   http://bazaar.launchpad.net/~bzr/bzr/bzr.urllib.keepalive
> 
> Look for revision 2025.
> 
> Any comments appreciated,
> 
>         Vincent
> 
> 
> 

I know we talked about it, but you should add a comment about why you
use DEBUG and self.debuglevel. (Basically because that is how urllib
does it, so we use the same mechanism so that all debug messages are
written to the same location, and kept in the correct order.)

...

> +from bzrlib import (
> +    __version__ as bzrlib_version,
> +    )

v- For future changes, we should try to use 'from bzrlib import errors'
and then reference the errors as 'errors.BzrError'. I wouldn't require
that you do it here, but in my other tests, I was able to shave a
significant portion of time, because exceptions don't actually need to
be imported for most operations.

> +from bzrlib.errors import (
> +    BzrError,
> +    ConnectionError,
> +    InvalidHttpResponse,
> +    NoSuchFile,
> +    )
> +
> +
> +# We define our own Response class to keep our httplib pipe clean
> +class Response(httplib.HTTPResponse):
> +    """Custom HTTPResponse, to avoid the need to decorate.
> +
> +    httplib prefers to decorate the returned objects, rather
> +    than using a custom object.
> +    """

^- Custom HTTPResponse to avoid decorating live objects.

> +
> +    # Some responses have bodies in which we have no interest
> +    _body_ignored_responses = [301,302, 303, 307, 401,]

^- minor spacing issue              ^^^^

> +
> +    def __init__(self, *args, **kwargs):
> +        httplib.HTTPResponse.__init__(self, *args, **kwargs)
> +
> +    def begin(self):
> +        """Begin to read the response from the server.
> +
> +        httplib assumes that some responses get no content and do
> +        not even attempt to read the body in that case, leaving
> +        the body in the socket, blocking the next request. Let's
> +        try to workaround that.
> +        """

^- httplib assumes that some responses do not have content, and does not
attempt to read the body. However, some of their assumptions are
incorrect, leaving the body in the socket, blocking the next request.
Here we try to workaround that.

> +        httplib.HTTPResponse.begin(self)
> +        if self.status in self._body_ignored_responses:
> +            if self.debuglevel > 0:
> +                print "For status: [%s]," % self.status,
> +                print "will ready body, length: ",
> +                if  self.length is not None:
> +                    print "[%d]" % self.length
> +                else:
> +                    print "None"
> +            if not (self.length is None or self.will_close):
> +                # In some cases, we just can't read the body not
> +                # even try or we may encounter a 104, 'Connection
> +                # reset by peer' error if there is indeed no body
> +                # and the server closed the connection just after
> +                # having issued the response headers (even if the
> +                # heaers indicate a Content-Type...)

# In some cases, if we try to read the body we may encounter a 104,
# 'Connection reset by peer' error. Such as when there is no body and
# the server closes the connection immediately after having issued the
# response headers (even if the headers indicate a Content-Type).

^- My understanding was that a HEAD request returns a Content-Type, but
no body (otherwise you wouldn't use HEAD :) Or are you just explaining
why you are checking length is None/ willclose?

> +                body = self.fp.read(self.length)
> +                if self.debuglevel > 0:
> +                    print "Consumed body: [%s]" % body
> +            self.close()
> +
> +

...

v- Are you sure the port will never be None? that may be the case...
> +    def get_key(self, connection):
> +        """Returns the key for the connection in the cache"""
> +        return '%s:%d' % (connection.host, connection.port)
> +
> +    def create_connection(self, request, http_connection_class):
> +        host = request.get_host()
> +        if not host:
> +            # Just a bit of paranoia here, this should have been
> +            # handled in the higher levels
> +            raise InvalidURL(request.get_full_url(), 'no host given.')
> +
> +        # We create a connection (but it will not connect yet)
> +        connection = http_connection_class(host)
> +
> +        return connection
> +
> +    def capture_connection(self, request, http_connection_class):
> +        """Capture or inject the request connection.
> +
> +        Two cases:
> +        - the request have no connection: create a new one,

^-                       ^^^ has no connection: create a new one
> +
> +        - the request have a connection: this one have been used

^-                       ^^^ has a connection: it has been used
> +          already, let's capture it, so that we can give it to
> +          another transport to be reused. We don't do that
> +          ourselves: the Transport object get the connection from
> +          a first request and then propagate it, from request to
> +          request or to cloned transports.
> +        """
> +        connection = request.connection
> +        if connection is None:
> +            # Create a new one
> +            connection = self.create_connection(request, http_connection_class)
> +            request.connection = connection
> +
> +        # All connections will pass here, propagate debug level
> +        connection.set_debuglevel(DEBUG)
> +        return request
> +
> +    def http_request(self, request):
> +        return self.capture_connection(request, HTTPConnection)
> +
> +    def https_request(self, request):
> +        return self.capture_connection(request, HTTPSConnection)
> +
> +
> +class AbstractHTTPHandler(urllib2.AbstractHTTPHandler):
> +    """A custom handler for HTTP(S) requests.
> +
> +    We overrive urllib2.AbstractHTTPHandler to get a better
> +    control of the connection, the ability to implement new
> +    request types and return a response able to cope with
> +    persistent connections.
> +    """
> +
> +    # We change our order to be before urllib2 HTTP[S]Handlers
> +    # and be chosen instead of them (the first http_open called
> +    # wins).
> +    handler_order = 400
> +
> +    _default_headers = {'Pragma': 'no-cache',
> +                        'Cache-control': 'max-age=0',
> +                        'Connection': 'Keep-Alive',
> +                        # FIXME: Spell it User-*A*gent once we
> +                        # know how to properly avoid bogus
> +                        # urllib2 using capitalize() for headers
> +                        # instead of title(sp?).
> +                        'User-agent': 'bzr/%s (urllib)' % bzrlib_version,
> +                        # FIXME: pycurl also set the following, understand why
> +                        'Accept': '*/*',
> +                        }

^- Accept: is usually what formats you will accept. Like 'text/plain' or
'text/html', etc. I could see us only accepting text/plain or
application/bytes, etc. I don't know the exact mime times we would want,
but for starters */* can be okay. Since at least it wouldn't reject a
request because the format was not available.

> +
> +    def __init__(self):
> +        urllib2.AbstractHTTPHandler.__init__(self, debuglevel=DEBUG)
> +
> +    def http_request(self, request):
> +        """Common headers setting"""
> +
> +        request.headers.update(self._default_headers.copy())
> +        # FIXME: We may have to add the Content-Length header if
> +        # we have data to send.
> +        return request

^- According to Robert, Content-Length isn't really used for HTTP/1.1,
because it has stuff like Chunked-Encoding.

> +
> +
> +    def do_open(self, http_class, request, first_try=True):
> +        """See urllib2.AbstractHTTPHandler.do_open for the general idea.
> +
> +        The request will be retried once if it fails.
> +
> +        urllib2 raises exception of application level kind, we
> +        just have to translate them.
> +
> +        httplib can raise exceptions of transport level (badly
> +        formatted dialog, loss of connexion or socket level
> +        problems). In that case we should issue the request again
> +        (httplib will close and reopen a new connection if
> +        needed).        
> +        """

^- You have a lot of trailing whitespace at the end.

> +        connection = request.connection
> +        assert connection is not None, \
> +            'Cannot process a request without a connection'
> +
> +        # Get all the headers
> +        headers = {}
> +        headers.update(request.header_items())
> +        headers.update(request.unredirected_hdrs)
> +
> +        try:
> +            connection._send_request(request.get_method(),
> +                                     request.get_selector(),
> +                                     # FIXME: implements 100-continue
> +                                     #None, # We don't send the body yet
> +                                     request.get_data(),
> +                                     headers)

^- I think the comments need to be cleaned up, at least the:
#None, #We don't send the body yet
should be removed. Because that looks like we don't, when really we do.

> +            if self._debuglevel > 0:
> +                print 'Request sent: [%r]' % request
> +            response = connection.getresponse()
> +            convert_to_addinfourl = True
> +        except (httplib.BadStatusLine, httplib.UnknownProtocol), exception:
> +            # A bogus server was encountered
> +            raise InvalidHttpResponse(request.get_full_url(),
> +                                      'Bad status line received',
> +                                      orig_error=exception)
> +        except (socket.error, httplib.HTTPException), exception:
> +            # httplib.HTTPException should indicate a bug in the
> +            # urllib implementation, somewhow the httplib
> +            # pipeline is in an incorrect state, we retry in hope
> +            # that will correct the problem but that may need
> +            # investigation (note that no such bug is known as of
> +            # 20061005 --vila).
> +
> +            # socket errors generally occurs for reasons far
> +            # outside our scope, so closing the connection and
> +            # retrying is the best we can do.
> +

v- Fix the indenting here, otherwise it looks like we exit the scope,
but we actually don't.

> +        # FIXME: and then there is HTTPError raised by:
> +        # - HTTPDefaultErrorHandler (we define our own)
> +        # - HTTPRedirectHandler.redirect_request 
> +        # - AbstractDigestAuthHandler.http_error_auth_reqed
> +
> +        # When an exception occurs, give back the original
> +        # Traceback or the bugs are hard to diagnose.
> +            exc_type, exc_val, exc_tb = sys.exc_info()
> +            if first_try:
> +                if self._debuglevel > 0:
> +                    print 'Received exception: [%r]' % exception
> +                    print '  On connection: [%r]' % request.connection
> +                    method = request.get_method()
> +                    url = request.get_full_url()
> +                    print '  Will retry, %s %r' % (method, url)
> +                connection.close()
> +                response = self.do_open(http_class, request, False)
> +                convert_to_addinfourl = False
> +            else:
> +                my_exception = ConnectionError(msg= 'while sending %s %s:'
> +                                               % (request.get_method(),
> +                                                  request.get_selector()),
> +                                               orig_error=exception)
> +                if self._debuglevel > 0:
> +                    print 'On connection: [%r]' % request.connection
> +                    method = request.get_method()
> +                    url = request.get_full_url()
> +                    print '  Failed again, %s %r' % (method, url)
> +                    print '  Will raise: [%r]' % my_exception
> +                raise my_exception, None, exc_tb
> +
> +# FIXME: HTTPConnection does not fully support 100-continue (the
> +# server responses are just ignored)
> +
> +#        if code == 100:
> +#            mutter('Will send the body')
> +#            # We can send the body now
> +#            body = request.get_data()
> +#            if body is None:
> +#                raise URLError("No data given")
> +#            connection.send(body)
> +#            response = connection.getresponse()
> +
> +        if self._debuglevel > 0:
> +            print 'Receives response: %r' % response
> +            print '  For: %r(%r)' % (request.get_method(),
> +                                     request.get_full_url())
> +
> +        if convert_to_addinfourl:
> +            # Shamelessly copied from urllib2
> +            req = request
> +            r = response
> +            r.recv = r.read
> +            fp = socket._fileobject(r)
> +            resp = urllib2.addinfourl(fp, r.msg, req.get_full_url())
> +            resp.code = r.status
> +            resp.msg = r.reason
> +            if self._debuglevel > 0:
> +                print 'Create addinfourl: %r' % resp
> +                print '  For: %r(%r)' % (request.get_method(),
> +                                         request.get_full_url())
> +        else:
> +            resp = response
> +        return resp
> +

v- Is this comment block useful? It seems to be after a 'return' so it
isn't actually commenting about anything. Should probably be removed.

> +#       # we need titled headers in a dict but
> +#       # response.getheaders returns a list of (lower(header).
> +#       # Let's title that because most of bzr handle titled
> +#       # headers, but maybe we should switch to lowercased
> +#       # headers...
> +#        # jam 20060908: I think we actually expect the headers to
> +#        #       be similar to mimetools.Message object, which uses
> +#        #       case insensitive keys. It lowers() all requests.
> +#        #       My concern is that the code may not do perfect title case.
> +#        #       For example, it may use Content-type rather than Content-Type
> +#
> +#        # When we get rid of addinfourl, we must ensure that bzr
> +#        # always use titled headers and that any header received
> +#        # from server is also titled.
> +#
> +#        headers = {}
> +#        for header, value in (response.getheaders()):
> +#            headers[header.title()] = value
> +#        # FIXME: Implements a secured .read method
> +#        response.code = response.status
> +#        response.headers = headers
> +#        return response
> +
> +

...

> +class HTTPRedirectHandler(urllib2.HTTPRedirectHandler):
> +    """Handles redirect requests.
> +
> +    We have to implement our own scheme because we use a specific
> +    Request object and because we want to implement a specific
> +    policy.
> +    """
> +    _debuglevel = DEBUG
> +    # RFC2616 says that only read requests should be redirected
> +    # without interacting with the user. But bzr use some
> +    # shortcuts to optimize against roundtrips which can leads to
> +    # write requests being issued before read requests of
> +    # containing dirs can be redirected. So we redirect write
> +    # requests in the same way which seems to respect the spirit
> +    # of the RFC if not its letter.

^- Well, for plain HTTP we don't have any write requests, only for
webdav. And we plan on supporting redirect such that future requests
know they should go to the new location, so we won't have any write
requests to forward. We should always have made several requests before
we start doing any writing (because we have to make sure the target
really is a branch, etc). But yes, we don't make a request on the file
we write to. (Though we would have made a request on the index file).

> +
> +    def redirect_request(self, req, fp, code, msg, headers, newurl):
> +        """See urllib2.HTTPRedirectHandler.redirect_request"""
> +        # We would have preferred to update the request instead
> +        # of creating a new one, but the urllib2.Request object
> +        # has a too complicated creation process to provide a
> +        # simple enough equivalent update process. Instead, when
> +        # redirecting, we only update the original request with a
> +        # reference to the following request in the redirect
> +        # chain.
> +
> +        # Some codes make no sense on out context and are treated
> +        # as errors:
> +
> +        # 300: Multiple choices for different representations of
> +        #      the URI. Using that mechanisn with bzr will violate the
> +        #      protocol neutrality of Transport.
> +
> +        # 304: Not modified (SHOULD only occurs with conditional
> +        #      GETs which are not used by our implementation)
> +
> +        # 305: Use proxy. I can't imagine this one occurring in
> +        #      our context-- vila/20060909
> +
> +        # 306: Unused (if the RFC says so...)
> +
> +        # FIXME: If the code is 302 and the request is HEAD, we
> +        # MAY avoid following the redirections if the intent is
> +        # to check the existence, we have a hint that the file
> +        # exist, now if we want to be sure, we must follow the
> +        # redirection. Let's do that for now.

^- I think it is best to not special case this. Because it may be a
generic redirect matching rule, which has no idea whether the target
exists or not. So I would guess this comment should be removed.

> +
> +        if code in (301, 302, 303, 307):
> +            return Request(req.get_method(),newurl,
> +                           headers = req.headers,
> +                           origin_req_host = req.get_origin_req_host(),
> +                           unverifiable = True,
> +                           # TODO: It will be nice to be able to
> +                           # detect virtual hosts sharing the same
> +                           # IP address, that will allow us to
> +                           # share the same connection...
> +                           connection = None,
> +                           parent = req,
> +                           )
> +        else:
> +            raise urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
> +
> +    def http_error_30x(self, req, fp, code, msg, headers):
> +        """Requests the redirected to URI.
> +
> +        Copied from urllib2 to be able to fake_close the
> +        associated connection, *before* issuing the redirected
> +        request but *after* having eventually raised an error.
> +        """
> +        # Some servers (incorrectly) return multiple Location headers
> +        # (so probably same goes for URI).  Use first header.
> +
> +        # TODO: Once we get rid of addinfourl objects, the
> +        # following will need to be updated to use correct case
> +        # for headers.
> +        if 'location' in headers:
> +            newurl = headers.getheaders('location')[0]
> +        elif 'uri' in headers:
> +            newurl = headers.getheaders('uri')[0]
> +        else:
> +            return
> +        if self._debuglevel > 0:
> +            print 'Redirected to: %s' % newurl
> +        newurl = urlparse.urljoin(req.get_full_url(), newurl)
> +
> +        # This call succeeds or raise an error. urllib2 returns
> +        # if redirect_request returns None, but our
> +        # redirect_request never returns None.
> +        redirected_req = self.redirect_request(req, fp, code, msg, headers,
> +                                               newurl)
> +
> +        # loop detection
> +        # .redirect_dict has a key url if url was previously visited.
> +        if hasattr(req, 'redirect_dict'):
> +            visited = redirected_req.redirect_dict = req.redirect_dict
> +            if (visited.get(newurl, 0) >= self.max_repeats or
> +                len(visited) >= self.max_redirections):
> +                raise urllib2.HTTPError(req.get_full_url(), code,
> +                                        self.inf_msg + msg, headers, fp)
> +        else:
> +            visited = redirected_req.redirect_dict = req.redirect_dict = {}
> +        visited[newurl] = visited.get(newurl, 0) + 1
> +
> +        # We can close the fp now that we are sure that we won't
> +        # use it with HTTPError.
> +        fp.close()
> +        # We have all we need already in the response
> +        req.connection.fake_close()
> +
> +        return self.parent.open(redirected_req)
> +
> +    http_error_302 = http_error_303 = http_error_307 = http_error_30x
> +
> +    def http_error_301(self, req, fp, code, msg, headers):
> +        response = self.http_error_30x(req, fp, code, msg, headers)
> +        # If one or several 301 response occur during the
> +        # redirection chain, we MUST update the original request
> +        # to indicate where the URI where finally found.
> +
> +        original_req = req
> +        while original_req.parent is not None:
> +            original_req = original_req.parent
> +            if original_req.redirected_to is None:
> +                # Only the last occurring 301 should be taken
> +                # into account i.e. the first occurring here when
> +                # redirected_to has not yet been set.
> +                original_req.redirected_to = redirected_url
> +        return response
> +
> +
> +class HTTPBasicAuthHandler(urllib2.HTTPBasicAuthHandler):
> +    """Custom basic authentification handler.
> +
> +    Send the authentification preventively to avoid the the
> +    roundtrip associated with the 401 error.
> +    """
> +
> +#    def http_request(self, request):
> +#        """Insert an authentification header if information is available"""
> +#        if request.auth == 'basic' and request.password is not None:
> +#            
> +#        return request

^- Can we just remove this rather than commenting it out? Though I don't
know what we gain if this isn't implemented (the class doesn't seem to
do anything else)

> +
> +
> +class HTTPErrorProcessor(urllib2.HTTPErrorProcessor):
> +    """Process HTTP error responses.
> +
> +    We don't really process the errors, quite the contrary
> +    instead, we leave our Transport handle them.
> +    """
> +    handler_order = 1000  # after all other processing
> +
> +    def http_response(self, request, response):
> +        code, msg, hdrs = response.code, response.msg, response.info()
> +
> +        if code not in (200, # Ok
> +                        206, # Partial content
> +                        404, # Not found
> +                        ):
> +            response = self.parent.error('http', request, response,
> +                                         code, msg, hdrs)
> +        return response
> +
> +    https_response = http_response
> +
> +
> +class HTTPDefaultErrorHandler(urllib2.HTTPDefaultErrorHandler):
> +    """Translate common errors into bzr Exceptions"""
> +
> +    def http_error_default(self, req, fp, code, msg, hdrs):
> +        if code == 404:
> +            raise NoSuchFile(req.get_selector(),
> +                             extra=HTTPError(req.get_full_url(), code, msg,
> +                                             hdrs, fp))
> +        else:
> +            # TODO: A test is needed to exercise that code path
> +            raise InvalidHttpResponse(req.get_full_url(),
> +                                      'Unable to handle http code %d: %s'
> +                                      % (code, msg))
> +
> +class Opener(object):
> +    """A wrapper around urllib2.build_opener
> +
> +    Daughter classes can override to build their own specific opener
> +    """
> +    # TODO: Provides hooks for daugther classes.
> +
> +    def __init__(self,
> +                 connection=ConnectionHandler,
> +                 redirect=HTTPRedirectHandler,
> +                 error=HTTPErrorProcessor,):
> +        self.password_manager = PasswordManager()
> +        # TODO: Implements the necessary wrappers for the handlers
> +        # commented out below
> +        self._opener = urllib2.build_opener( \
> +            connection, redirect, error,
> +            #urllib2.ProxyHandler,
> +            urllib2.HTTPBasicAuthHandler(self.password_manager),
> +            #urllib2.HTTPDigestAuthHandler(self.password_manager),
> +            #urllib2.ProxyBasicAuthHandler,
> +            #urllib2.ProxyDigestAuthHandler,
> +            HTTPHandler,
> +            HTTPSHandler,
> +            HTTPDefaultErrorHandler,
> +            )
> +        self.open = self._opener.open
> +        if DEBUG > 0:
> +            # When dealing with handler order, it's easy to mess
> +            # things up, the following will help understand which
> +            # handler is used, when and for what.
> +            import pprint
> +            pprint.pprint(self._opener.__dict__)
> +
> 
> === modified file NEWS // last-changed:v.ladeuil+lp at free.fr-20061005093155-bb31
> ... 1bc6ebf11994
> --- NEWS
> +++ NEWS
> @@ -55,6 +55,9 @@
>  
>    IMPROVEMENTS:
>  
> +    * urllib implements keep-alive connections and share them
> +      (Vincent Ladeuil, #53654)
> +
>      * Knit files now wait to create their contents until the first data is
>        added. The old code used to create an empty .knit and a .kndx with just
>        the header. However, this caused a lot of extra round trips over sftp.
> @@ -95,6 +98,10 @@
>  
>    BUG FIXES:
>  
> +    * Handles user/passwords supplied in url from command
> +      line. Don't request already known passwords (Vincent
> +      Ladeuil, #42383, #44647, #48527)
> +

^- This reminds me, we probably want to actually warn about the safety
implications of providing a password in a URL. Though we can't remove
the functionality until we provide some sort of alternative. (supplying
it by hand is okay, but we also need a way to supply one in a config
file), or something like ~/.auth_info

v- Is this something that you are actually using, or is it just
something that you want for webdav?

I'm fine bringing in stuff for webdav, but I'd probably rather bring it
in as a separate patch.

> === modified file bzrlib/tests/__init__.py // last-changed:v.ladeuil+lp at free.fr
> ... -20061005092948-585bd3d003d5db82
> --- bzrlib/tests/__init__.py
> +++ bzrlib/tests/__init__.py
> @@ -1162,6 +1162,13 @@
>          self.assertTrue(t.is_readonly())
>          return t
>  
> +    def create_transport_readonly_server(self):
> +        """Create a transport server from class defined at init.
> +
> +        This is mostly a hook for daugter classes.
> +        """
> +        return self.transport_readonly_server()
> +
>      def get_readonly_server(self):
>          """Get the server instance for the readonly transport
>  
> @@ -1175,7 +1182,7 @@
>                  self.__readonly_server = ReadonlyServer()
>                  self.__readonly_server.setUp(self.__server)
>              else:
> -                self.__readonly_server = self.transport_readonly_server()
> +                self.__readonly_server = self.create_transport_readonly_server()
>                  self.__readonly_server.setUp()
>              self.addCleanup(self.__readonly_server.tearDown)
>          return self.__readonly_server
> @@ -1434,6 +1441,13 @@
>      readwrite one must both define get_url() as resolving to os.getcwd().
>      """
>  
> +    def create_transport_server(self):
> +        """Create a transport server from class defined at init.
> +
> +        This is mostly a hook for daugter classes.
> +        """
> +        return self.transport_server()
> +
>      def get_server(self):
>          """See TestCaseWithMemoryTransport.
>  

...

v- If we are going to start getting fancy about how we do
'socket.bind()' then I want to move this up to a TestCase helper
function, because we use it in several places (ssh wants to connect to
an invalid socket, etc).

It may actually be best as a helper function is 'osutils', since I'm not
sure what places need access to an unconnected socket. (Are they
TestCases or test Servers?)

> +        # Get a free address and don't 'accept' on it, so that we
> +        # can be sure there is no http handler there, but set a
> +        # reasonable timeout to not slow down tests too much.
> +        default_timeout = socket.getdefaulttimeout()
> +        try:
> +            socket.setdefaulttimeout(2)
> +            s = socket.socket()
> +            s.bind(('localhost', 0))
> +            t = self._transport('http://%s:%s/' % s.getsockname())
> +            self.assertRaises(ConnectionError, t.has, 'foo/bar')
> +        finally:
> +            socket.setdefaulttimeout(default_timeout)

^- Also, is there any reason that we can't reset the default timeout
earlier? Like:

default = socket.getde...
try:
  socket.setdefaulttimeout(1)
  s = socket.socket()
  s.bind(...)
finally:
  socket.setdefaulttimeout(default)

It may be that you need the default timeout during the self._transport()
portion. Though again, I don't see why the whole test suite couldn't be
run with a short timeout time.

...

v- Is 'WallServer' the correct thing to call it? I assume you are
meaning something like 'it is like talking to a wall'. Since the service
doesn't exist. Maybe "NoResponseServer' would be better..?

> +
> +
> +class TestWallServer(object):
> +    """Tests exceptions during the connection phase"""
> +
> +    def create_transport_readonly_server(self):
> +        return bzrlib.transport.http.HttpServer(WallRequestHandler)
> +
> +    def test_http_has(self):
> +        server = self.get_readonly_server()
> +        t = self._transport(server.get_url())
> +        self.assertRaises(ConnectionError, t.has, 'foo/bar')
> +
> +    def test_http_get(self):
> +        server = self.get_readonly_server()
> +        t = self._transport(server.get_url())
> +        self.assertRaises(ConnectionError, t.get, 'foo/bar')
> +
> +

...

v- curl doesn't check the protocol version
> +# curl don't check the protocol version
> +#class TestBadProtocolServer_pycurl(TestWithTransport_pycurl,
> +#                                   TestBadProtocolServer,
> +#                                   TestCaseWithWebserver):
> +#    """Tests BadProtocolServer for pycurl implementation"""
> +
> +
> 
...

v- Need an extra whitespace here. And as mentioned, we might want to
find a better name.

>  
> +class WallRequestHandler(TestingHTTPRequestHandler):
> +    """Whatever request comes in, close the connection"""
> +
> +    def handle_one_request(self):
> +        """Handle a single HTTP request, by abruptly closing the connection"""
> +        self.close_connection = 1
> +
> +

v- This may be something we need to be careful about for python2.5,
since I think they changed how e.args works.
But for now, it seems okay.

> +class BadStatusRequestHandler(TestingHTTPRequestHandler):
> +    """Whatever request comes in, returns a bad status"""
> +
> +    def parse_request(self):
> +        """Fakes handling a single HTTP request, returns a bad status"""
> +        ignored = TestingHTTPRequestHandler.parse_request(self)
> +        import socket
> +        try:
> +            self.send_response(0, "Bad status")
> +            self.end_headers()
> +        except socket.error, e:
> +            if (len(e.args) > 0) and (e.args[0] == errno.EPIPE):
> +                # We don't want to pollute the test reuslts with
> +                # spurious server errors while test succeed. In
> +                # our case, it may occur that the test have
> +                # already read the 'Bad Status' and closed the
> +                # socket while we are still trying to send some
> +                # headers... So the test is ok but if we raise
> +                # the exception the output is dirty. So we don't
> +                # raise, but we close the connection, just to be
> +                # safe :)
> +                self.close_connection = 1
> +                pass
> +            else:
> +                raise
> +        return False
> +

...

v- If we are going to have lots of these (Mattheiu Moy is adding one in
his other patch) then I would change it into a list, with each entry
having a trailing ','. And it should be sorted lexicographically, so as
to reduce the amount of noise when the list is modified. For example:

if e[0] in (
    _pycurl_errors.CURLE_COULDNT_CONNECT,
    _pycurl_errors.CURLE_COULDNT_RESOLVE_HOST,
    _pycurl_errors.CURLE_GOT_NOTHING,
    ):

It just helps to understand the actual diff.

>              mutter('got pycurl error: %s, %s, %s, url: %s ',
>                      e[0], _pycurl_errors.errorcode[e[0]], e, url)
>              if e[0] in (_pycurl_errors.CURLE_COULDNT_RESOLVE_HOST,
> -                        _pycurl_errors.CURLE_COULDNT_CONNECT):
> +                        _pycurl_errors.CURLE_COULDNT_CONNECT,
> +                        _pycurl_errors.CURLE_GOT_NOTHING):
>                  self._raise_curl_connection_error(curl)
>              # jam 20060713 The code didn't use to re-raise the exception here
>              # but that seemed bogus
> 

So, I think there is a little bit of cleanup that could be done.

But overall, I think it may not be 100% perfect, but there are no
regressions from our current codebase, and it means that we can reduce
our dependency on pycurl, which I think is a big benefit.

I'd like to see it land in 0.12. And it has my +1 if some small changes
are made. (Mostly just small comment fixes)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFKEqtJdeBCYSNAAMRAgpbAJ4w7z+bvzZtZoHU1H3P8Sw/245YcgCfVd0k
lKJ8VNISZe2ASsU41wKyzQM=
=Vk3t
-----END PGP SIGNATURE-----