[MERGE] Fix handling of unicode paths in smart server requests

Mon Oct 16 06:14:44 BST 2006

John Arbash Meinel wrote:
> Andrew Bennetts wrote:
[...]
> 
> v- These tests look like they will fail with a UnicodeError when running
> 'make check' which runs LANG=C python ./bzr selftest.
> 
> They also are semi-likely to fail on specific platforms. But that is the
> easiest way to flush it out. Have you tested 'make check'?.
> 	LANG=C ./bzr selftest SmartServer.\*Unicode
> should flush it out as well.

I haven't tried make check, but you're right that LANG=C triggers a problem.

> v- This seems like it would look better without the newline.
> 
> >              bytes = self._in.read(bytes_to_read)
> >              if bytes == '':
> > +
> >                  # Connection has been closed.
> >                  self.finished = True
> >                  self._out.flush()

Oops, yes.  That was an accident.  Fixed.

> v- What about URL escaping? It seems like you need to be passing your
> backing_transport a url_escaped utf-8 string. We generally do:
> 
>  encode('utf8') => urllib.quote() => url_string
> 
> Not
>  urllib.quote() => encode('utf8') => url_string
> 
> So you may need to update you tests to include things like ' ' or '+',
> or '%' which all need to be escaped for urls. (There is the helper
> urlutils.escape() for stuff like this).
[...]
> >      def do_has(self, relpath):
> > +        relpath = relpath.encode('utf-8')
> >          r = self._backing_transport.has(relpath) and 'yes' or 'no'
> >          return SmartServerResponse((r,))
> >  
> >      def do_get(self, relpath):
> > +        relpath = relpath.encode('utf-8')
> >          backing_bytes = self._backing_transport.get_bytes(relpath)
> >          return SmartServerResponse(('ok',), backing_bytes)
> >  
> 
> Now, truth be told, we rarely need unicode over a transport. Working
> trees are accessed directly, and file-ids and revision ids avoid having
> unicode in their actual characters (they would be escaped anyway for
> filesystem safety).
> 
> That said, we Transport wants to be unicode safe. But I want to check
> that you are also url safe.

Yeah, I was thinking about that.  My immediate problem was that the smart
protocol was decoding UTF-8 off the wire, and then passing the resulting unicode
objects to methods that only accepted str objects (the WSGI code triggered this,
because the ChrootTransportDecorator it uses passes all relpaths through
self.abspath(), which for at least some transports involves _combine_paths,
which checks for non-str objects).  

I think I've probably solved the wrong problem; as you say, url-quoting needs to
happen somewhere.  The existing situation where unicode objects are passed as
relpaths to transports isn't right, though.

I'll take another shot at this with url-quoting.  Thanks for the review.

-Andrew.