[PERFORMANCE] url encoding

Robert Collins robertc at robertcollins.net
Fri Jun 9 21:33:00 BST 2006


On Fri, 2006-06-09 at 14:40 -0500, John Arbash Meinel wrote:
> Robert Collins wrote:
> > I have a thought about a good compromise for performance in our
> > transport layer.
> > 
> > Each Transport is a handle to a directory... and URLs from users are
> > broken into two parts:
> > 
> > - the directory we connect to
> > - the file the user specified
> > 
> > After that split, all the file operations we perform, including
> > subdirs-from-the-user-dir, are generated by bzr, and are strictly ascii.
> > 
> > It seems reasonable to me that we could encode this in the api of
> > Transport - that is, 'transport.put(foo)' would expect an ascii only,
> > non unicode, non escaped, relative reference in foo, with no '..'. This
> > would allow LocalTransport to fast-path all local operations with no
> > escaping, without running into a mess with predicting encoding on remote
> > servers.
> > 
> > Aaron and I chatted about this and thought it was reasonable. What does
> > the larger community think?
> > 
> > Rob
> 
> Well, do we want to declare that everything under .bzr/ is always going
> to be 7-bit ascii with no illegal filesystem characters?

Yup - AFAIK we have already done this, with one exception - the contents
of checkout/limbo

> It seems reasonable to do, since that helps integrity. It would mean
> escaping needs to still happen at higher levels, though. 

This is already in place..
> And probably
> unescaping would also need to happen (at the higher level). But you are
> right, it would mean that LocalTransport wouldn't need to escape the
> results from 'listdir'.

Much more than that :
put, append, get, put_multi, rename, copy_to etc etc etc all do a URL
escape/unescape *and* (for local transport) unicode decode operation.
This is biting quite some overhead.

> Except for the fact that Transport.mkdir() can be called on user
> supplied paths, for stuff like 'bzr branch' and 'bzr push', and I think
> we want to allow for having unicode paths outside of .bzr.

So my proposal is that all 'dir' operations - mkdir, clone, rmdir -
assume they are getting url relative references. And that everything
else assumes it is getting an ascii relative-downwards path (no '..').

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060610/a4d70cab/attachment.pgp 


More information about the bazaar mailing list