"Using Saved Location: foo"

John Arbash Meinel john at arbash-meinel.com
Fri Apr 28 14:42:40 BST 2006


Jan Hudec wrote:
> On Fri, Apr 28, 2006 at 17:02:40 +1000, Martin Pool wrote:
>> On 28/04/2006, at 12:55 AM, John Arbash Meinel wrote:
>>
>>> Aaron Bentley wrote:
>>>> John Arbash Meinel wrote:
>>>>> But for local file URLs, it seems like it would be nicer to  
>>>>> display the
>>>>> actual Unicode path.
>>>> Makes sense to me.
>> I think we want a method that gives the "best display form" for a URL  
>> in the current user encoding.
>>
>> For we will *try* to decode them as utf-8 and then put them into the  
>> user encoding.  If that's not possible, we just use the regular URL  
>> form.  In more detail: scan for escape sequences, and work out what  
>> byte they represent.  If it's a byte that has special meaning in URLs  
>> [/?,;#&%+ ] then leave it alone, otherwise unescape it.  Now try to  
>> decode as UTF-8, and then check that the result is representable in  
>> the user encoding - if either of those fail, leave it as an escaped  
>> URL.  Obviously this needs to be symmetric with the algorithm for  
>> turning url-like unicode strings into real URLs, and I think it is.

I chose to try and split it up per hierarchy component. So my algorithm
is something like:

split on '/'
	split chunk on '%'
		expand safe escapes (all but the unsafe ones)
	test_chunk = join unescaped hunks
	try
		chunk = test_chunk.decode(utf-8)
	except UnicodeDecodeError:
		# leave chunk alone

join chunks with '/'

This gives the option that if you have a non-utf-8 portion of your url,
the rest of the URL is still decoded properly.

I did this, because files under bzr control should have utf-8
representation, but we don't control above that.
			
> 
> Space should not be in that list -- if we get raw space in url-like
> string, we encode it (while we mustn't do it with the other, truly
> special ones) so we should decode it as well IMHO.

Well, if a given chunk doesn't decode, then I leave it as a fully
escaped chunk. I suppose I could give it one more pass for things that
are known to be safe. But I think I'd rather leave it alone.

> 
>> [...] 
>> John, Robert and I talked about this on irc.
>>
>> A related issue is that if you do a pull while cwd is in a  
>> subdirectory of a branch, the relative path stored should not be the  
>> one you typed, but rather the relative path from the root of your  
>> branch.
>>
>> There may be some slightly tricky cases caused by symlinks.
>>
>> Branch APIs that deal with locations (e.g. Branch.base) should return  
>> urls, even for local branches.
>>
>> The parent is also a URL or relative reference.  That means that it  
>> can be e.g. "../bzr.dev", but it is escaped as a URL.  Setting the  
>> parent will work like this:
> 
> Can I suggest having a:
> class url(string):
> 	...
> ?

*shudder*

I think I can see your point. But I've seen mostly pain when people try
to inherit from string.

> 
> Then various places in the code can assert that they really have and
> url (I'd include relative-url-reference) in it.
> 

The other problem is that user input isn't a URL (yet). It may actually
be a URL, but before it hits Transport, it is just a Unicode string.

Maybe if we inherited from string, but didn't actually try to add any
members, it might be okay as a debugging tool. It seems like it would
make the Transport api harder to use. Since now it not only expects
valid url fragments, but it requires them to be "url()" instances.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060428/4d2500da/attachment.pgp 


More information about the bazaar mailing list