urlutils improvements from hpss

Sun Apr 15 16:56:03 BST 2007

I realise that these changes have already been merged, but I wanted to address
John's concerns about the name of "urlutils.joinpath".

John Arbash Meinel wrote:
> Andrew Bennetts wrote:
> > John Arbash Meinel wrote:
[...]
> >>>
> >> No, I mean we have 2 functions. "osutils.joinpath" and
> >> "osutils.pathjoin". Where "pathjoin" is inherited from os.path.join, but
> >> then "joinpath" seems to be doing the same thing, only disallowing '..',
> >> '', and None in the paths.
> > 
> > Also, osutils.pathjoin varies behaviour depending on the host platform, which is
> > not something we want to do with URLs.  Joining URL paths should work
> > identically on all platforms.
> 
> I agree, though I thought that osutils.pathjoin() should function
> approximately the same, regardless of the platform. (Since internally we
> use all '/' separated paths)

"Approximately" doesn't sound very safe to me :)

I think it's clearer for code to say "urlutils.pathjoin" than "osutils.pathjoin"
when joining URLs.  We've had code in the past that was confused about the kinds
of paths they dealt with.

> >> The naming is close enough to make it seem like they should be the same
> >> function. And they do almost the same thing, but one has a subtle
> >> difference.
> >>
> >> For urlutils I simplified it to urlutils.join().
> >>
> >> But now there is urlutils.joinpath(), and from a naive glance, I don't
> >> see a big difference between the two.
> > 
> > urlutils.join joins a full URL, and one or more path parts of a URL, returning a new
> > URL.
> > 
> > urlutils.joinpath joins a path part of a URL, and one or more path parts of a
> > URL, returning a new path part of a URL.
> 
> maybe 'urlutils.join_partial'? Or "join_fragments"?

Neither of those is as precise or clear as "joinpath".  "join_partial" is too
vague, and "join_fragments" sounds like it has something to do with fragment
identifiers (the "#foo" bit of URLs).  I regret the confusion with filesystem
paths, but "path" is the term that STD 66, the URI standard, uses for these
things, and it was also the obvious first name that I thought of when naming the
function.

I personally don't find that "urlutils.join" vs. "urlutils.joinpath" is
particulary confusing.  Maybe that's because I always do "bzrlib import
urlutils" not "from bzrlib.urlutils import joinpath"?

[...]
> >> What is also weird to me is that urlutils.join() does this in the interior:
> >>
> >>             path = '/'.join(path)
> >>             path = joinpath(path, arg)
> >>             path = path.split('/')
> > 
> > It makes sense for urlutils.join to delegate part of its task to
> > urlutils.pathjoin, rather than duplicating the logic.
> 
> I agree, but it seems like both of these should be using a helper which
> works in the same way that they internally do (on lists), rather than
> munging to get a string, to pass to a command which converts to a list,
> and then converts back to a string, so that we can split it back into a
> list.

This sounds like a good idea.  File a bug and assign it to me if you like,
although of course my immediate priority is landing hpss.

[...]
> 
> Probably if it was called something other than "joinpath" or "pathjoin"
> I would feel better about it. Simply because those names are already
> overloaded (too much, IMO).

In my mind, its name is really "urlutils.joinpath", not just "joinpath".  And
HACKING says that imports of the form "from foo import bar" should not be used
(except for importing submodules), so that really is the way it should be appear
in code.  Is that not enough to ease your mind?

Otherwise, the best alternative name I can think of is "join_url_path", but
"urlutils.join_url_path" seems like an ugly name to have to type and read to me.

-Andrew.