sys.stderr very slow

John Arbash Meinel john at arbash-meinel.com
Thu Nov 2 17:06:53 GMT 2006


It seems that writing to sys.stderr is much slower than at least I
thought it would be. I was doing some profiling of 'bzr branch linux
test' to see about improving our branch time for a kernel sized tree.
The #1 function was 'file.write()' which had the numbers:

 7986212  306.4769  306.4769   <method 'write' of 'file' objects>

Which means that it was called almost 8 million times. Now, 7.7 million
of those times were because of TreeTransform.create_file() which expects
an "iterator" of file contents, and so does:

for segment in contents:
  f.write(segment)

We are passing in a list of lines, and there are approx 7.7 million
lines of code in the kernel tree. So I thought one fix would be to
either use: f.writelines(contents) or map(f.write, contents)

However, looking closer, that isn't the bulk of the time spent writing.

   20027  131.5512  80.2390   bzrlib.transform:253(create_file)
+7702169  28.0185   28.0185   +<method 'write' of 'file' objects>
  +20027  21.6706   21.6706   +<method 'close' of 'file' objects>

So it seems that it only is taking 30s to write out the contents of all
those files. What *is* taking a lot of time is:

 252712  267.7009    2.6912   bzrlib.progress:242(update)
+252722  265.0097  265.0097   +<method 'write' of 'file' objects>

Now, this was using the "DotsProgressBar" which effectively calls
sys.stderr.write('.') for every update() call. I thought the penalty is
because sys.stderr is unbuffered, so it forces a flush on every write,
which updates the screen, etc. (Though in this case it was re-directed
to a file).

So I wrote a patch which changes to buffer the output (attached), and
then I tried calling self.to_file.writelines(), but it turns out that
writelines() was still the number #1 single expense with:
  3165  259.9056  259.9056  <method 'writelines' of 'file' objects>

So even though it was being called 100x less, it still cost almost as
much. So I thought maybe writelines was just a loop around .write(), so
I switched to write(''.join(buffer)). This helped, but far less than I
thought it would. The specific expense was:
 +3500  192.1253  192.1253   +<method 'write' of 'file' objects>

So I shaved off a whole lot of calls, but I only shaved off 20% of the
overhead time.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: buffer_dots.patch
Type: text/x-patch
Size: 2887 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061102/debe5c25/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061102/debe5c25/attachment.pgp 


More information about the bazaar mailing list