[MERGE][BUG 43731][BUG 112173][BUG 111702] Transport connection sharing

Fri Jul 20 19:15:22 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:

...

> 
> Above comment added in the code as a TODO.
> 
> Well comment removed. I changed from ftp to sftp. The only
> problem is that commands.test_cat.TestCat.test_cat now fails.
> The good news are that the 25 command tests pass in 2s instead of
> 25s, woot !
> 
> I had to update most of the tests to properly use
> self.get_url(relpath) instead of self.get_url() + '[/]relpath'.
> 
> I put a FIXME explaining why I skip the test now. 'cat' was
> issuing only one connection so there is no urgency is fixing that
> now, but I think that reveals a deeper problem in the sftp
> transport: detailless exceptions from paramiko will have to be
> dealt with if we want to use it as our test server. In that
> specific case, I think SFTPTransport.get() should be split in two
> try blocks, one that will raise NoSuchFile and the other that
> will raise ReadError.

I'll have to look closer at that. It does seem to indicate a problem with our
SFTPTransport code.

> 
>     john> I would prefer it if you gave a custom protocol, like
>     john> hooked+ftp://, rather than re-using the ftp:// scheme,
>     john> and assuming that your handler will win (because it was
>     john> the last one registered). It *is* likely, I don't know
>     john> if it will be always guaranteed. But it also means that
>     john> you can end up with weird failures. Where something
>     john> changes, and it causes seemingly unrelated tests to
>     john> start failing.
> 
> I can't get it working right now, can you give me a +1
> conditional if I do that this week-end ?

Well, it depends if your work is +1 conditional worthy :).

But yeah, I won't strictly block on this.

...

>     john> These don't really seem like 'bzrlib/tests/commands' maybe
>     john> 'bzrlib/tests/command_connections'. Though it is getting a bit long.
> 
> The idea was that 'tests/commands' will contain all tests about
> command behavior of which the connection related ones are the
> first instance.
> 
> So I leave it as is.

I do understand your point, I was just thinking that it is something that we
should discuss as a project. But who knows if any other devs will be reading
the middle of such a long conversation.

> 
>     john> For ftp and sftp connection caches, do we need to
>     john> deprecate them first?
> 
> I don't think so, they were private and even a bug have been recorded.

I wouldn't be surprised if there is code out there that will now start
connecting multiple times because they though the connection would be cached.
But, as long as we make it clear how to do it 'right', I'm okay with it.

...

>     john> I would actually prefer it if
>     john> 'read_mergable_from_transport' could take the actual
>     john> transport and filename, and do its own cloning up a
>     john> parent directory, rather than having you do all of the
>     john> work to normalize and split, and then clone, etc.
> 
> I don't get it. 'read_mergable_from_transport' *takes* the
> transport and filenames as parameters. But the *user* supplies a
> full url that we have to normalize, split, etc. I'm not
> particularly happy with the early attempt to detect the
> directories, but that was there and I respected it.
>

If someone types 'bzr log path/to/foo/' then urlutils.split will return
('path/to/foo', ''), and we know that '' cannot be a file.

I don't remember why I specifically chose that workaround. I know
read_bundle_from_url (at one point) caused 'bzr merge sftp://foo/branch' to
fail. But 'bzr merge sftp://foo/branch/' could still work.

I'm mostly commenting that it would be nice if we didn't have to have
'cmd_merge' know all the details about the path that was supplied by the user.
And instead 'read_mergable_*' could have that knowledge.

It is a bit of a contention with wanting to hang on to Transport objects. And
that Transport objects don't have a
'_change_to_parent_dir_and_give_me_the_path()' which could be used to:

parent_transport, relpath = t.pop_to_parent()
f = parent_transport.get(relpath)

So I'm okay with what you have done, it just feels like our api is exposing
information that we would rather not.

...

>     john> === modified file 'bzrlib/bzrdir.py'
>     john> -        return format.initialize(safe_unicode(base))
>     john> +        return format.initialize(base, possible_transports)
> 
>     john> ^- A parameter like possible_transports really seems
>     john> like it should be passed in as a keyword, not as a
>     john> positional argument.
> 
> Hmm, I see two use cases for it: 
> 
> - bzr code that cares about connection sharing (should be the
>   norm) will use a proper value and in that case, it *is* a
>   positional argument,
> 
> - code that doesn't care (tests) and that will not use it. The
>   default value of None is then appropriate,
> 
> I consider it an *important* argument for which using None should
> be carefully thought. Making it a keyword argument will present
> it as something optional which I think is wrong. Keep in mind
> that each time someone *forget* to provide the appropriate value,
> we will have additional connections.
> 
> In one of my first attempts, I even made it a mandatory
> positional parameter where [] was a legal value, but then I
> realized that offering a default value of None was more
> developer-friendly.
> 
> <snip/>
> 

You can't make it mandatory without breaking api compatibility. So yeah, to
start with you have to make it optional.

It feels like a keyword argument / hint to me. But I can understand your desire
to make it more mandatory, since not using it has some (potentially strong)
repercussions.

One thing you could consider doing is to allow possible_transports to be None,
but make that deprecated. Something like:

def get_transport(base, possible_transports=None):
  if possible_transports is None:
    symbol_versioning.warn(
	'get_transport requires possible_transports as of 0.19, defaulting'
	' to the empty list')
    possible_transports = []

(don't forget the appropriate tests that use this in a deprecated manner, etc.)

> 
>     john> -        # TODO: jam 20060426 we probably need a test in here in the
>     john> -        #       case that the newly sprouted branch is a remote one
>     john> -        if result_repo is None or result_repo.make_working_trees():
>     john> +        if isinstance(target_transport, LocalTransport) and (
>     john> +            result_repo is None or result_repo.make_working_trees()):
>     john>              wt = result.create_workingtree()
>     john>              wt.lock_write()
>     john>              try:
> 
>     john> ^- This doesn't seem like the time/place to be fixing
>     john> this.
> 
> Why ? I added failing tests for remote branch creation and
> diagnosing both of them leads me here, where I find (with a
> smile) your comment.
> 
> May be sprout tries to do too much, but it is used for branch
> creation, so that seems the perfect place to fix it.

This place is a correct fix.
Tangling it up with your patch to remove connection caches is not. It should be
a separate clean patch to apply to bzr.dev.

I'm happy to see it, you seem to provide tests for it, etc. Just that it should
be 2 separate submissions.

> 
>     john> But did you at least add tests that 'bzr branch local
>     john> sftp://remote' doesn't try to create a working tree?
> 
> I added tests that ensures that 'bzr branch remote remote' and
> 'bzr branch local remote' issues 2 and 1 connections only.

If 'bzr branch remote remote' is on the same remote machine, shouldn't it only
issue 1 connection?

> 
> I will add blackbox tests to ensure that no working trees is
> created. (Done).

Well, this seems like it should be done at the 'Branch.sprout()' testing layer,
it doesn't really have to be tested at the blackbox layer.

> 
>     john> -class PathNotChild(BzrError):
>     john> +class PathNotChild(PathError):
> 
>     john> ^- Is there a specific reason for this? It seems
>     john> correct, but I didn't see an explanation.
> 
> No specific reason other than I came across it during debugging
> and just wanted to clean it up (I have yet to setup an appropriate
> workflow to submit such trivial changes, but now that I have PQM
> access, that should be easier).
> 
>     john> +        # vila--20070607 if the following are commented out the test suite
>     john> +        # still pass. Is this really still needed or was it a forgotten
>     john> +        # temporary fix ?
>     john>          if not isinstance(t, self.transport_class):
>     john>              # we did not get the correct transport class type. Override the
>     john>              # regular connection behaviour by direct construction.
> 
>     john> Can you elaborate a bit on this?
> 
> I suppose that when this code was added, some test classes were
> returning bogus transports and the above hack had helped fixing
> them.
> 
> No tests fail *today* if this code is commented out.
> 
>     john> It looks like something Robrert wrote, so I don't know
>     john> what is going on myself.
> 
>     john> ...
> 
>     john>          self.assertEquals(t.base, 'file://HOST/')
> 
>     john> +class TestConnectedTransport(TestCase):
>     john> +    """Tests for connected to remote server transports"""
>     john> +
> 
>     john> ^- 2 spaces please
> 
> Huh ? Why ?

class TestOne:

  def test_x(self):
    end of test one

class TestTwo:

  def ...

I probably didn't make it clear that the extra space is before 'class
TestConnectedTransport' not at the docstring.

> 
> <snip/>
> 
>     john> +        if base[-1] != '/':
>     john> +            base += '/'
> 
>     john> ^- This should always be:
> 
>     john> if not base.endswith('/'):
>     john>   base += '/'
> 
> Thanks. Inherited code, I didn't understand why both were used. Done.

(Most likely bogus code written by yours truly)

> 
>     john> +        (self._scheme,
>     john> +         self._user, self._password,
>     john> +         self._host, self._port,
>     john> +         self._path) = self._split_url(base)
> 
>     john> ^- It seems a little odd that you will split the
>     john> scheme, etc from 'base' but use the connection from
>     john> 'from_transport'. At a minimum, it seems like they
>     john> should be verified to match.
> 
> Well, from_transport is used in only two cases AFAIK:
> 
> - by clone which use 'self' as value and just change the path
>   part (so nothing to control here),
> 
> - by _reuse_for which indeed checks that the connection can be
>   reused.
> 
> Now, I can transfer the control made by _reuse_for in __init__
> but that will be useless for clone.
> 
> I fail to imagine that verification helping to detect a bug though.
> 
> I'll make from_transport a private parameter instead. Thoughts ?

making it private seems better.

> 
>     john> +        if from_transport is not None:
>     john> +            # Copy the password as it does not appear in base and will be lost
>     john> +            # otherwise. It can appear in the _initial_split_url above if the
>     john> +            # user provided it on the command line. Otherwise, daughter classes
>     john> +            # will prompt the user for one when appropriate.
>     john> +            self._password = from_transport._password
>     john> +
>     john> +        base = self._unsplit_url(self._scheme,
>     john> +                                 self._user, self._password,
>     john> +                                 self._host, self._port,
>     john> +                                 self._path)
> 
>     john> ^- I also thought we tended to remove 'self._password'
>     john> from the official URL, so that errors/etc won't expose
>     john> passwords to the terminal or to the log.
> 
> It was not done everywhere, this patch makes it for all transports.

Someone mentioned on IRC (bug report?) that this causes:

bzr push ftp://user:password@host/path/on/host

To forget the password as part of the saved location. So you have to enter your
password. (editing ~/.bazaar/locations.conf can overcome this).

So there is a bit of a tension. I guess I could argue that we should have
'self._display_base' which is used when reporting errors, and another one that
is used for saving parent and push locations.

> 
>     john> This also seems a good place to use keywords:
> 
>     john> base = self._unsplit_url(scheme=self._scheme,
>     john> 			 user=self._user,
>     john> 			 ...
>     john> 			)
> 
>     john> Also, if _unsplit_url is only being called with self._*
>     john> parameters, it should probably just use them directly,
>     john> rather than having them be passed in.
> 
> 
>     john> I do notice that _unsplit_url is ignoring the password
>     john> parameter. So it would probably be best to just not
>     john> pass it in.
> 
> 
> Well I have several counter arguments here:
> 
> 1 - I used it with different parameters during the refactoring
>     and even if none of them has survived, I can easily see the
>     need come back, with a different path or a different scheme
>     (think decorators here),
> 
> 2 - it mimicks urlparse.urlunsplit and seems more readable that way,
> 
> 3 - it mirrors _split_url
> 
> 
> I understand your concerns here, but my preference would be to be
> conservative ((1) may well come back).
> 
> (2) and (3) helps in finding and understanding the methods
> quickly. I'm not really concerned about performance here (we are
> about to make I/Os anyway).
> 
> But I promise that if in two months from now, no usage have been
> found, I'll simplify it. Deal ? :)

Good enough I guess.

> 
> <snip/>
> 
>     john> -# TODO: The should be marked as deprecated
>     john> +# TODO: They should be marked as deprecated
>     john> +# vila-20070606: How should we do that ?
>     john>  urlescape = urlutils.escape
>     john>  urlunescape = urlutils.unescape
> 
>     john> ^- Probably:
> 
>     john> urlescape = deprecated_function(zero_XXXX, urlutils.escape)
> 
> No. deprecated_function is a decorator. Can I postponed that ?

Specifically, if you do:

@foo
def bar():
  return X

That is the same as doing:

def bar():
  return X
bar = foo(bar)

You have to do it a bit differently in this case, because 'deprecated_function'
is a function which returns a wrapper which then gets applied. So it is actually:

urlescape = deprecated_function(zero_nineteen)(urlutils.escape)

It follows straightforward from:

@deprecated_function(zero_nineteen)
def urlescape
...

Though the syntax isn't quite as nice.

...

> See above for some pending (but I think not blocking) questions.
> 
> merge-directive --diff (full one is 2MO) *and* diff of the
> updates attached for easier review.
> 
>     Vincent

+    def test_branch_local_remote(self):
+        self.run_bzr(['branch', 'branch', self.get_url('remote')])
+        t = self.get_transport()
+        # Ensures that no working tree what created remotely
+        self.assertFalse(t.has('remote/file'))
+
+    def test_branch_remote_remote(self):
+        # Light cheat: we access the branch remotely
+        self.run_bzr(['branch', self.get_url('branch'),
+                      self.get_url('remote')])
+        t = self.get_transport()
+        # Ensures that no working tree what created remotely
+        self.assertFalse(t.has('remote/file'))
+

^- 'ensure that no working tree was created remotely'

     def test_cat(self):
+        # FIXME: sftp raises ReadError instead of NoSuchFile when probing for
+        # branch/foo/.bzr/branch-format when used with the paramiko test
+        # server.
+        from bzrlib.tests import TestSkipped
+        raise TestSkipped

You should pass a string to TestSkipped. Perhaps

raise TestSkipped('SFTPTransport raises incorrect exception'
		  ' when reading from paramiko server')

...

- -        :param from_transport: optional transport to build from. The built
+        :param _from_transport: optional transport to build from. The built
             transport will share the connection with this transport.
         """
^- Maybe add the comment:
The caller is required to ensure that _from_transport points at the same host
as the new base.

- -    def __init__(self, url, from_transport=None, medium=None, _client=None):
+    def __init__(self, url, _from_transport=None, medium=None, _client=None):
         """Constructor.

^- Having 2 private parameters and a public one inbetween seems a little weird.
Probably we can't change it. It just doesn't seem correct.

I believe you've satisfied the bulk of my concerns. (+1 Cond)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGoPujJdeBCYSNAAMRAhNPAKCPaXYrw+yYMyjAkRs8m7WC5XbiZwCgqUma
fgjI9J3yivz3vM5TtkRdbDE=
=FXw8
-----END PGP SIGNATURE-----