[rfc] http redirection and (BzrDir, BzrDirFormat) related questions
John Arbash Meinel
john at arbash-meinel.com
Mon Dec 18 21:26:45 GMT 2006
Vincent Ladeuil wrote:
> Hi all,
>
> Here is a first shot (rough) at handling http redirections (see
> https://blueprints.launchpad.net/products/bzr/+spec/http-redirection).
>
> The proposed implementation attached works (but it's really
> rough). Well, if you disabled pycurl that is*.
>
> But there are two points that may benefit from the mailing list
> inputs:
>
> - the probe_transport semantic evolution,
>
> - the new 'hints' parameter to the transport.get method.
I don't think you are using this quite like you meant to, but I'll
comment more on that in the code.
The problem with maintaining API compatibility is that there are 2
directions to maintain.
There is code that calls Transport.foo() and there are classes that
implement Transport.foo() that are being called.
If you add an optional parameter such as "hints=None". Then we have
satisfied compatibility in one direction. Old callers that place a call
on a new Transport will work fine.
However, a new caller that passes 'hints' to a Transport which does not
implement it will fail.
So do we need bidirectional compatibility? I would really like to say
no, but I want people to comment on it before we decide.
The way I can see maintaining bidirectional is to add a new function
"get_with_hints()" or something like that. And then the default
implementation just calls 'get()' with no hints.
Everyone else re-implements get() by calling get_with_hints() and
passing the empty parameter. And now 'get_with_hints()' is the "real" get().
So it is possible, but it is a little bit ugly, and it would be nice if
we didn't have to do it.
I guess another possibility would be to have 1 month where it was
deprecated, and then switched to the new interface.
At this point, I feel like we offer compatibility from the top down,
more than from the bottom up. So people who implement new transports,
branch formats, log formatters, etc need to stay in sync. But the code
that *calls* these things can stay compatible.
Either that, or we need to re-visit any class which is defined as a
template/interface, and change it to a double blind one. I'm not sure
the exact name, but basically rather than having children override
.get() (which is the user facing call), they override ._get() or
.get_impl() or something like that.
And then the default implementation of get() is:
def get(self, relpath):
return self.get_impl(relpath)
And then when we update our APIs, we can change the default
implementation for compatibility in both directions.
Do we really need to have all of our interfaces be double indirection?
I realize this doesn't have a lot to do with your specific work, but I
just wanted to discuss how we really want to handle compatibility.
Oh, and there are probably some interfaces where performance starts to
become an issue. So we may need to think about that as well.
...
>
> The problem is that at the point where the redirection is
> detected (probe_transport), there is no way to update the
> transport used by the BzrDir. So I catch the exception in
> BzrDir.open_from_transport.
To start with, thank you very much for being thorough about checking
other implementations.
>
> On IRC John raised concerns about deprecation, i.e. if I raise an
> exception, what will append for clients outside bzr core ?
>
...
>
> At that point I wonder if we really have to care about
> deprecation as the only branch concerned are the bzr ones and we
> want them to follow redirections.
>
So the question is whether there is a user-facing option that turns on
this functionality (probe_transport(raise_on_redirect=True)).
If we add this parameter, then it can be brought up the stack
(open(raise_on_redirect)) or something like that.
This brings out the double-direction compatibility problem. Which is
what I was originally talking about on IRC. If we want to be compatible
in one direction (a caller of probe_transport() without the new option
gets the old behavior), it inherently breaks in the other way (a caller
passing the new option, but calling an implementation that doesn't
implement it).
My personal feeling right now is that probe_transport is an
implementation facing interface, and open_from_transport is a
user-facing one. So if our implementation of probe_transport happens to
raise an exception that open_from_transport() can handle, then we don't
have a compatibility problem. User-level code won't generally call
probe_transport directly, so it doesn't matter that for a given
interface it has a slightly more advanced API, since that is all hidden
at the user-facing function.
> What do *you* think ?
>
> Second point, adding a 'hints' optional dict parameter to
> Transport.get. The patch attached modify the existing transports
> and make the testsuite happy.
>
...
> - format = BzrDirFormat.find_format(transport)
> + initial_base = transport.base
> + redirected = True # to enter the loop
> + redirections = 0
> + # If a loop occurs, there is little we can do. So we
> + # don't try to detect them, just getting out if too much
> + # redirections occurs. The solution is outside: where the
> + # loop is defined.
v- Magic numbers like this should generally be defined as a class or
instance constant. So you can do:
while redirected and redirections < self.MAX_REDIRECTIONS:
...
> + while redirected and redirections < 8:
> + try:
> + transport.follow_redirections = False
> + format = BzrDirFormat.find_format(transport)
> + redirected = False
> + except errors.RedirectRequested, e:
> + # In case someone above in the call stack want to
> + # continue using this transport
> + transport.follow_redirections = True
^- I thought you decided to switch away from
'transport.follow_redirections' in favor of get with hints?
> +
> + relpath = transport.relpath(e.source)
> + if not e.target.endswith(relpath):
> + # Not redirected to a branch-format, not a branch
> + raise errors.NotBranchError(path=e.target)
> +
> + target = e.target[:-len(relpath)]
> + note('%s has been%s redirected to %s',
> + transport.base,
> + e.permanently,
> + target)
> + # Let's try with a new transport
> + transport = get_transport(target)
> + redirections += 1
...
I think we actually want a 'hints' class rather than using a plain
dictionary, and then implement a "def create_get_hints(self, *kwargs):"
function.
I personally feel the nicest looking api is just:
def get(self, relpath, **hints):
And then you can do:
get(relpath, follow_redirection=False)
However, I think the discussion is that it was easy to type the wrong
thing, and you wouldn't get an error, because the parameter would just
be accepted and ignored.
So the alternative was:
def get(self, relpath, hints=None):
def create_get_hints(self, follow_redirections=None, hint2=None):
fp = transport.get(relpath,
hints=transport.create_get_hints(follow_redirections=True))
Note that we still have compatibility problems if we want to add more
hints. Though we can go the double indirection and have
'create_get_hints()' call:
return self._create_get_hints(follow_redirections=follow_redirections)
And then if we add more hints, the default implementation can check if:
self._create_get_hints_extra() is defined, and if not, throw away the
new hints and use self._create_get_hints().
Again, it starts getting *really* crufty, *really* fast. So I would
really like to have a good discussion about what we really want out of
API compatibility.
>
> - def get(self, relpath):
> + def get(self, relpath, hints={}):
> """Get the file at the given relative path.
>
> :param relpath: The relative path to the file
> @@ -1148,9 +1148,12 @@
> self._adapted = adapted
> self._calls = []
>
...
> === modified file bzrlib/transport/chroot.py
> --- bzrlib/transport/chroot.py
> +++ bzrlib/transport/chroot.py
> @@ -69,7 +69,7 @@
> self._ensure_relpath_is_child(relpath)
> return TransportDecorator.delete_tree(self, relpath)
>
> - def get(self, relpath):
> + def get(self, relpath, hints={}):
> self._ensure_relpath_is_child(relpath)
> return TransportDecorator.get(self, relpath)
Shouldn't the chroot decorator be passing the hints to it's wrapped class?
>
> @@ -77,10 +77,6 @@
> self._ensure_relpath_is_child(relpath)
> return TransportDecorator.get_bytes(self, relpath)
>
Nice catch that 'get()' was actually defined twice. This should be
merged regardless of anything else.
Also, you probably will want a hints parameter for get_bytes(), since at
least *I* would like to switch to it for some of the format probing.
> - def get(self, relpath):
> - self._ensure_relpath_is_child(relpath)
> - return TransportDecorator.get(self, relpath)
> -
> def has(self, relpath):
> self._ensure_relpath_is_child(relpath)
> return TransportDecorator.has(self, relpath)
>
v- You are using a dictionary, not an object, so you want 'dict.get()'
not getattr()
request.follow_redirections = hints.get('follow_redirections', True)
> request = Request('GET', abspath, None, headers)
> + request.follow_redirections = getattr(hints, 'follow_redirections', True)
> response = self._perform(request)
>
John
=:->
More information about the bazaar
mailing list