[MERGE] Faster LocalTransport.put() and append() methods.
John Arbash Meinel
john at arbash-meinel.com
Fri Jun 9 20:27:59 BST 2006
Robert Collins wrote:
> Seeking a +1.
>
> This patch changes AtomicFile to be more targeted at the way we use it,
> rather than a general safe-for-everything tool. It also changes
> LocalTransport.append() to have less syscalls while being no longer.
>
> The combination takes making a kernel_like_tree down from 7.3 seconds to
> 1.5, which makes some benchmarks much more pleasant to run.
>
> This is in my add() branch, where i'm currently fiddling stuff.
>
> Rob
>
Where is AtomicFiles being used? By 'build_tree'? (Which I'm guessing is
going through Transport.put() or somesuch)
>
> === modified file 'bzrlib/atomicfile.py'
> --- bzrlib/atomicfile.py 2005-12-18 04:08:46 +0000
> +++ bzrlib/atomicfile.py 2006-06-09 10:38:59 +0000
> @@ -15,10 +15,17 @@
> # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>
>
> -
> +import codecs
> +import errno
> +import os
> +import socket
> +import sys
> from warnings import warn
> +
> from bzrlib.osutils import rename
> -import errno
> +
> +# not forksafe - but we dont fork.
> +_pid = os.getpid()
>
^- PEP8 needs extra blank lines. Also 'not forksafe' seems like
something we need a FIXME. While 'bzr' doesn't fork, I don't think we
can be that absolute for 'bzrlib'.
> class AtomicFile(object):
> """A file that does an atomic-rename to move into place.
> @@ -31,74 +38,67 @@
> An encoding can be specified; otherwise the default is ascii.
> """
>
> - def __init__(self, filename, mode='wb', encoding=None, new_mode=None):
> - if mode != 'wb' and mode != 'wt':
> - raise ValueError("invalid AtomicFile mode %r" % mode)
> -
> - import os, socket
> - self.tmpfilename = '%s.%d.%s.tmp' % (filename, os.getpid(),
> - socket.gethostname())
> + __slots__ = ['f', 'tmpfilename', 'realfilename', 'write']
> +
> + def __init__(self, filename, mode='wb', new_mode=0666):
> + self.f = None
> + assert mode in ('wb', 'wt'), \
> + "invalid AtomicFile mode %r" % mode
> +
> + # old version:
> + #self.tmpfilename = '%s.%d.%s.tmp' % (filename, os.getpid(),
> + # socket.gethostname())
> + # new version:
> + # This is 'broken' on NFS: it wmay collide with another NFS client.
> + # however, we use this to write files within a directory that we have
> + # locked, so it being racy on NFS is not a concern. The only other
> + # files we use this for are .bzr.ignore, which can race anyhow.
> + self.tmpfilename = '%s.%d.tmp' % (filename, _pid)
> self.realfilename = filename
>
> - if encoding:
> - import codecs
> - self.f = codecs.open(self.tmpfilename, mode, encoding)
> - else:
> - self.f = open(self.tmpfilename, mode)
> -
> + # Use a low level fd operation to avoid chmodding later.
> + fd = os.open(self.tmpfilename, os.O_EXCL | os.O_CREAT | os.O_WRONLY,
> + new_mode)
> + # open a normal python file to get the text vs binary support needed
> + # for windows.
> + try:
> + self.f = os.fdopen(fd, mode)
> + except:
> + os.close(fd)
> + raise
> self.write = self.f.write
> - self.closed = False
> - self._new_mode = new_mode
> -
new_mode may be None, and under certain circumstances, we actually want
to set the mode outside of the umask. The big reason for the chmod was
so that on shared branches we would chmod the proper group bits, so that
files would stay group readable, even if you had your umask at 0022
(which is fairly common).
Your changes break this.
Also, AtomicFile used to preserve the mode bits of the target file if
you did not pass 'new_mode'. So in the working tree, if we were
replacing a file with the execute bit set, the new file would also have
the execute bit set.
I don't think we use AtomicFile for the working directories, though I
think we might think about it.
Anyway, I like the speed increase, but you seem to have done it by
sacrificing the actual effect of the class.
...
> - def append(self, relpath, f, mode=None):
> + def append(self, relpath, f, mode=0666):
> """Append the text in the file-like object into the final location."""
> abspath = self._abspath(relpath)
> try:
> - fp = open(abspath, 'ab')
> - # FIXME should we really be chmodding every time ? RBC 20060523
> - if mode is not None:
> - os.chmod(abspath, mode)
> + fd = os.open(abspath, os.O_CREAT | os.O_APPEND | os.O_WRONLY, mode)
> except (IOError, OSError),e:
> self._translate_error(e, relpath)
> - # win32 workaround (tell on an unwritten file returns 0)
> - fp.seek(0, 2)
> - result = fp.tell()
> - self._pump(f, fp)
> + try:
> + result = os.lseek(fd, 0, 2)
> + # TODO: make a raw FD version of _pump ?
> + self._pump_to_fd(f, fd)
> + finally:
> + os.close(fd)
> return result
We should at least be 'chmod' ing if we create the file, see my earlier
discussion about shared branches. And we need 'chmod' because of umask.
Either that, or we always run with umask '0000', which I think is worse.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060609/bf1f0305/attachment.pgp
More information about the bazaar
mailing list