[merge] http multirange support

Tue Jul 18 16:03:02 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 13 Jul 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> 

...

> Well, that's a very nice performance improvement to see.  On the other
> hand failing to follow redirects is a substantial regression,
> particularly as people may still be trying to pull from bazaar-ng.org,
> which currently does a redirect.  (I suppose we could, and possibly
> should, change it so that /bzr just passes through, since they're still
> the same machine.)  
> 
> I presume there must be some reason why you had to turn off curl
> FOLLOWLOCATION but what was it?  If that won't work, perhaps we can
> raise a special exception class on the redirect and redo it in the
> Transport?

I haven't tested it much. This is going off of Michael's comment. We
left FOLLOWLOCATION on, but I believe it ends up returning 2 headers,
one for the initial Redirect, and a second for the following 'real'
header. And then when we parse the header contents, we get the wrong one.

I think the fix isn't very hard, just focused on other things.

If this might block an 0.9 release, I'll focus on it.

> 
> Anyhow I've read the patch and it looks reasonable, with a few comments.
> It looks like it would put us in a good position to do a fast pure
> python transport.  I'll check it out on the network and against thttpd.
> I do have some concerns about putting it in 0.9 with redirects broken.
> 

Yeah, the only thing we really need for a fast python one is to cache
the connections. urllib is already faster because of the changes, just
not quite as fast as pycurl.

...

>> +        # A regular non-range response, unfortunately the result from
>> +        # urllib doesn't support seek, so we wrap it in a StringIO
>> +        tell = getattr(data, 'tell', None)
>> +        if tell is None:
>> +            return StringIO(data.read())
>> +        return data
> 
> Why not just do 'if not hasattr'...?

Because hasattr() swallows all exceptions, getattr() only swallows
AttributeError exceptions. In general Robert has recommended that
hasattr() never be used.

> 
>> +    elif code == 404:
>> +        raise errors.NoSuchFile(url)
>> +
>> +    # TODO: jam 20060713 Properly handle redirects (302 Found, etc)
>> +    #       The '_get' code says to follow redirects, we probably 
>> +    #       should actually handle the return values
>> +
>> +    raise errors.InvalidHttpResponse(url, "Unknown response code %s" % (code,))
>> +
>>
> 
> Just for obviousness I would suggest to put the 'raise' in an else block
> immediately after the last elif...

Do you want it in an else, or just a comment:

# If we got this far, we've failed to handle the return, raise an error.

> 
>>      def has(self, relpath):
>> -        curl = pycurl.Curl()
>> +        """See Transport.has()"""
>> +        # We set NO BODY=0 in _get_full, so it should be safe
>> +        # to re-use the non-range curl object
>> +        curl = self._base_curl
>>          abspath = self._real_abspath(relpath)
>>          curl.setopt(pycurl.URL, abspath)
>> -        curl.setopt(pycurl.FOLLOWLOCATION, 1) # follow redirect responses
>>          self._set_curl_options(curl)
>>          # don't want the body - ie just do a HEAD request
>> +        # This means "NO BODY" not 'nobody'
>>          curl.setopt(pycurl.NOBODY, 1)
>>          self._curl_perform(curl)
>>          code = curl.getinfo(pycurl.HTTP_CODE)
>> @@ -92,38 +104,80 @@
>>              return False
>>          elif code in (200, 302): # "ok", "found"
>>              return True
>> -        elif code == 0:
>> -            self._raise_curl_connection_error(curl)
>>          else:
>>              self._raise_curl_http_error(curl)
> 
> I see you now call _raise_curl_connection_error from other places - is
> it really no longer required from here?  There are some other places
> too.

I believe we were getting 'code==0' because we were suppressing the
pycurl errors. Since I am now raising them, I removed these lines.
Also, there were 2 errors that mean unable to connect, and I handle
those explicitly, rather than just dealing with 'code==0'.

> 
>> +    def _get_full(self, relpath):
>> +        """Make a request for the entire file"""
>> +        curl = self._base_curl
>> +        abspath, data, header = self._setup_get_request(curl, relpath)
>>          self._curl_perform(curl)
>> +
>>          code = curl.getinfo(pycurl.HTTP_CODE)
>> +        data.seek(0)
>> +
>>          if code == 404:
>>              raise NoSuchFile(abspath)
>> -        elif code == 200:
>> -            sio.seek(0)
>> -            return code, sio
>> -        elif code == 206 and (ranges is not None):
>> -            sio.seek(0)
>> -            return code, sio
>> -        elif code == 0:
>> -            self._raise_curl_connection_error(curl)
>> -        else:
>> -            self._raise_curl_http_error(curl)
>> +        if code != 200:
>> +            raise errors.InvalidHttpResponse(abspath,
>> +                'Expected a 200 or 404 response, not: %s' % code)
>> +
>> +        return code, data
> 
> Raising directly from here seems like a regression from calling
> _raise_curl_http_error, which centralizes it and includes the eventual
> url, which is often useful.
> 

Well, 'abspath' is the full requested url. I suppose if we hit a
redirect, then _raise_curl_http_error would print out the final URL
rather than the initial redirect. And this is kind of a special case.
This code path is when we have a full request. Getting an HTTP 206
(partial content) would be wrong, but it isn't really an invalid http
response. So I'm clarifying that with *this* code path, we don't support
anything other than 200 or 404.

I could parameterize _raise_curl_http_error() so that it can take a
message to print, rather than just raising a 'bad http code:' error.

I'll do some cleanup and get back to you.

If I get redirect working properly, do I have a +1 to merge this in?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEvPgmJdeBCYSNAAMRAn0VAJ0SQdRcRYRaKUXHzVGgq5JULegWSACeOpWK
VT1DUf1sM2rB5KF9wJetfyg=
=JvgQ
-----END PGP SIGNATURE-----