[rfc] Handling redirects properly.

John Arbash Meinel john at arbash-meinel.com
Thu Jul 20 20:17:02 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 19 Jul 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
>> Basically, we would have something like:
>>
>> try:
>>   format = BzrDirFormat.find_format(a_transport)
>>
>>   ...
>> except RedirectRequested, e:
>>   note('URL %s has been redirected to %s', e.old_url, e.new_url)
>>   a_transport = get_transport(e.new_url)
>>
>> I think this puts the functionality right about where we want it. So
>> that when we go to open a new branch, we get redirected to the correct
>> location.
> 
> I agree with the basic idea of raising an exception on redirects, and of
> using them to switch entirely to a different directory.  If you would
> like to go into this that would be great, but there are some things to
> consider:
> 
> If we get a redirection at some later point the error will be uncaught
> and terminate bzr.  We don't really expect to get redirections, but
> still it would seem nice if the default behaviour is to just follow
> them.

I think we could enable direct redirect handling for everything but the
when we know that we will catch the exception.

> 
> Presumably the actual error will be complaining about some sub-url
> (e.g. .bzr/format) so you'll need to infer the right adjustment to the
> url from that.  Or you could look at the base directory first, but that
> might cause trouble if the server won't answer for directories.
> 
> This might relate to how we do http authentication?  It seems somewhat
> similar: when we first try to access the url, we get an http error and
> need to prompt for a username/password and retry.  (Not that you need to
> handle it now, but it just occurred to me.)
> 
> Does this totally belong in BzrDirFormat.open?  Perhaps the general
> approach of iterating until we actually find something should be put
> into the transport, in an operation similar to "get physical path of x",
> or "get canonical url".
> 

Well, the problem is that 'get_transport()' doesn't typically bind to
anything until you actually issue a 'get()' request.
So the first time we connect to the remote machine is typically when we
request the contents of .bzr/branch-format.
We do this primarily because requests for directories do weird things.
We don't strictly expect to be able to list contents of anything, or to
get attributes of directories. (has(directory) is undefined, especially
since 'ftp' can't do anything there).

I would be happy if we could change get_transport() to have some sort of
'get_canonical_transport()' which would follow redirects, and end up
with the final location. But that changes 'get_transport()' such that we
have to be able to contact the target, and have it do something for a
directory.
It is possible to do so, but I would rather do it higher up, where we
actually have something we are looking for.


I think the easiest way to retain the most functionality, while still
being nice for http redirects is to update 'Transport.get()' so that it
can take a 'follow_redirects' parameter, which would default to True.
The api can be defined so that it is a hint, rather than a required command.

And then for HTTP if that flag is set, it follows just like it does now.
And the only caller who sets it to false is
BzrDir.open/open_containing_from_transport

Now for handling the full url issue for redirects I was thinking to do:

try:
  find_format(a_transport)
except RedirectRequested, e:
  relpath = a_transport.relpath(e.old_url)
  assert e.new_url.endswith(relpath)
  new_base_url = e.new_url[:-len(relpath)]

I'll have to see if all my indexes are right. But that is the basic idea.

Now, instead of doing the assert, we *could* just retry the request with
'follow_redirects=True', and just keep trying. However, if someone is
redirecting '.bzr/branch-format' to 'index.html', we have a serious
error. I would rather raise some sort of InvalidRedirectRequested.

I could also see us not try to follow an invalid redirect, but not
actually raise an error. Since we might be in 'open_containing' and we
just haven't gotten to a real branch yet, and the client has some sort
of 'redirect all invalid requests to *this* page' setup.

(For example, right now you can do:

bzr log http://bazaar-vcs.org/bzr/bzr.dev/NEWS

Which will give you the log of changes on the remote branch for entries
which modified the NEWS file. It is pretty slow, but it works)

People seem generally positive about my idea, so I'll probably look into
doing it.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEv9auJdeBCYSNAAMRAvhtAJoCc6nm6jkKkluQWIPT/Lbf8MZd0ACgzM3L
JvB8pWdKqZRJT3wrk7ZqxeE=
=EIvy
-----END PGP SIGNATURE-----




More information about the bazaar mailing list