"Using Saved Location: foo"

Sun Apr 30 01:34:11 BST 2006

On 28/04/2006, at 11:42 PM, John Arbash Meinel wrote:
>
> I chose to try and split it up per hierarchy component. So my  
> algorithm
> is something like:
>
> split on '/'
> 	split chunk on '%'
> 		expand safe escapes (all but the unsafe ones)
> 	test_chunk = join unescaped hunks
> 	try
> 		chunk = test_chunk.decode(utf-8)
> 	except UnicodeDecodeError:
> 		# leave chunk alone
>
> join chunks with '/'
>
> This gives the option that if you have a non-utf-8 portion of your  
> url,
> the rest of the URL is still decoded properly.
>
> I did this, because files under bzr control should have utf-8
> representation, but we don't control above that.

OK that looks good.

We might eventually want IDNA display of domain names to be handled  
separately, but that can be done later.

>> Can I suggest having a:
>> class url(string):
>> 	...
>> ?
>
> *shudder*
>
> I think I can see your point. But I've seen mostly pain when people  
> try
> to inherit from string.

Perhaps URL(object), but with a __str__ method that allows it to be  
used in place?

>> Then various places in the code can assert that they really have and
>> url (I'd include relative-url-reference) in it.
>
> The other problem is that user input isn't a URL (yet). It may  
> actually
> be a URL, but before it hits Transport, it is just a Unicode string.

I think we would want separate factory methods: one comes from a  
strictly ascii, properly encoded URL, and raises if it's not  
correct.  The other accepts Unicode input from the user and applies  
the heuristics in this thread, raising only if it's impossible to  
work out what they mean.  And yet another that turns local paths into  
file urls.

Robert has expressed interest in writing a properly standards- 
compliant URL module for Python -- apparently none of the existing  
ones are strictly correct.

> Maybe if we inherited from string, but didn't actually try to add any
> members, it might be okay as a debugging tool. It seems like it would
> make the Transport api harder to use. Since now it not only expects
> valid url fragments, but it requires them to be "url()" instances.

True - so would methods accept both, or would we take strings for  
fragments, but whole URLs when they are needed?

-- 
Martin