[MERGE] Transport.non_atomic_put

Martin Pool mbp at canonical.com
Fri Aug 18 23:28:56 BST 2006


On 18 Aug 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> Attached is a patch that creates a Transport.non_atomic_put() command.
> Along with a default implementation, and custom implementations for
> Local and SFTP transports. (Ultimately we should add one for FTP as well).
> 
> This is in response to my earlier discovery about why 'push' is slow,
> and I'm trying to get the updates written in small chunks.

Good idea; +1 with comments.

> === modified file bzrlib/tests/test_transport_implementations.py // last-change
> ... d:john at arbash-meinel.com-20060818193050-9da47e496bc0166f
> --- bzrlib/tests/test_transport_implementations.py
> +++ bzrlib/tests/test_transport_implementations.py
> @@ -144,6 +144,29 @@
>          self.assertRaises(NoSuchFile,
>                            t.put, 'path/doesnt/exist/c', 'contents')
>  

I think tests are easier for people to maintain and understand if each
case just tests one thing.  In particular if something new is failing
it's much easier if the test name tells you what it is.  So even though
this test is quite clear as it is, perhaps it can be pushed a bit
further.

> +    def test_non_atomic_put(self):
> +        t = self.get_transport()
> +
> +        if t.is_readonly():
> +            self.assertRaises(TransportNotPossible,
> +                    t.non_atomic_put, 'a', StringIO('some text for a\n'))
> +            return

So perhaps this should be split out into a separate
test_nonatomic_put_on_readonly_fails, since it is a separate
requirement.  Maybe that's taking it too far.

> +    def non_atomic_put(self, relpath, f, mode=None):
> +        """Copy the file-like object into the target location.
> +
> +        This function is not strictly safe to use. It is only meant to
> +        be used when you already know that the target does not exist.
> +        It is not safe, because it will open and truncate the remote
> +        file. So there may be a time when the file has invalid contents.
> +
> +        :param relpath: The remote location to put the contents.
> +        :param f:       File-like object.
> +        :param mode:    Possible access permissions for new file.
> +                        None means do not set remote permissions.
> +        """
> +        # Default implementation just does an atomic put.
> +        # TODO: jam 20060818 Frequently, this command will be called
> +        #       when the parent directory may not exist. It might be possible
> +        #       to optimize by adding another parameter requesting that
> +        #       we create the parent dir if it does not exist.
> +        #       For now, I will leave the api more simple, since I'm not
> +        #       sure what sftp or local transport could do that would
> +        #       be better than raising NoSuchFile and having the caller
> +        #       call mkdir() and then call us again.
> +        return self.put(relpath, f, mode=mode)
> +

Well, if you think it usually will not exist, it might be better to
always try the mkdir and ignore EEXISTS.

Will the caller usually have a file-like object already?  Perhaps we
should have non_atomic_put_bytes, so as to avoid creating and unwrapping
a StringIO object.  Perhaps that should be deferred and done to all
methods at once.

> === modified file bzrlib/transport/sftp.py
> --- bzrlib/transport/sftp.py
> +++ bzrlib/transport/sftp.py
> @@ -604,6 +604,58 @@
>              # raise the original with its traceback if we can.
>              raise
>  
> +    def non_atomic_put(self, relpath, f, mode=None):
> +        """Copy the file-like object into the target location.
> +
> +        This function is not strictly safe to use. It is only meant to
> +        be used when you already know that the target does not exist.
> +        It is not safe, because it will open and truncate the remote
> +        file. So there may be a time when the file has invalid contents.
> +
> +        :param relpath: The remote location to put the contents.
> +        :param f:       File-like object.
> +        :param mode:    Possible access permissions for new file.
> +                        None means do not set remote permissions.
> +        """
> +        abspath = self._remote_path(relpath)
> +        path = self._sftp._adjust_cwd(abspath)
> +        attr = SFTPAttributes()
> +        if mode is not None:
> +            attr.st_mode = mode
> +        omode = (SFTP_FLAG_WRITE | SFTP_FLAG_CREATE | SFTP_FLAG_TRUNC)
> +
> +        fout = None
> +        try:
> +            t, msg = self._sftp._request(CMD_OPEN, path, omode, attr)
> +            if t != CMD_HANDLE:
> +                raise TransportError('Expected an SFTP handle')
> +            handle = msg.get_string()
> +            fout = SFTPFile(self._sftp, handle, 'wb', -1)
> +        except (paramiko.SSHException, IOError), e:
> +            self._translate_io_exception(e, abspath, ': unable to open',
> +                failure_exc=FileExists)
> +
> +        try:
> +            fout.set_pipelined(True)
> +            self._pump(f, fout)
> +        except (IOError, paramiko.SSHException), e:
> +            self._translate_io_exception(e, tmp_abspath)

> +        # XXX: This doesn't truly help like we would like it to.
> +        #      The problem is that openssh strips sticky bits. So while we
> +        #      can properly set group write permission, we lose the group
> +        #      sticky bit. So it is probably best to stop chmodding, and
> +        #      just tell users that they need to set the umask correctly.
> +        #      The attr.st_mode = mode, will handle when the user wants the
> +        #      final mode to be more restrictive. And then we avoid a round 
> +        #      trip. Unless paramiko decides to expose an async chmod()
> +        #
> +        #      This is designed to chmod() right before we close.
> +        #      Because we set_pipelined() earlier, theoretically we might avoid
> +        #      The round trip for fout.close()
> +        if mode is not None:
> +            self._sftp.chmod(tmp_abspath, mode)
> +        fout.close()
> +

What do you mean by 'sticky bit' here?  To me it is the 01000 bit, and
there's no such thing as the 'group sticky bit', and anyhow the umask
never affects them.  I don't see why we'd be wanting to set either the
sticky bit or the setgid bit on plain files...

  http://www.uwsg.iu.edu/UAU/files/sticky.html 

Telling users to set the umask properly might be OK locally but it may
not be easy for an sftp server - they'll need to work out what shell
script, if any, gets run before the sftp server runs.  I think in many
cases they would need the sftp server's sysadmin to change it globally.

Should the close() be in a finally block?

-- 
Martin




More information about the bazaar mailing list