sys.stderr very slow
John Arbash Meinel
john at arbash-meinel.com
Thu Nov 2 17:06:53 GMT 2006
It seems that writing to sys.stderr is much slower than at least I
thought it would be. I was doing some profiling of 'bzr branch linux
test' to see about improving our branch time for a kernel sized tree.
The #1 function was 'file.write()' which had the numbers:
7986212 306.4769 306.4769 <method 'write' of 'file' objects>
Which means that it was called almost 8 million times. Now, 7.7 million
of those times were because of TreeTransform.create_file() which expects
an "iterator" of file contents, and so does:
for segment in contents:
f.write(segment)
We are passing in a list of lines, and there are approx 7.7 million
lines of code in the kernel tree. So I thought one fix would be to
either use: f.writelines(contents) or map(f.write, contents)
However, looking closer, that isn't the bulk of the time spent writing.
20027 131.5512 80.2390 bzrlib.transform:253(create_file)
+7702169 28.0185 28.0185 +<method 'write' of 'file' objects>
+20027 21.6706 21.6706 +<method 'close' of 'file' objects>
So it seems that it only is taking 30s to write out the contents of all
those files. What *is* taking a lot of time is:
252712 267.7009 2.6912 bzrlib.progress:242(update)
+252722 265.0097 265.0097 +<method 'write' of 'file' objects>
Now, this was using the "DotsProgressBar" which effectively calls
sys.stderr.write('.') for every update() call. I thought the penalty is
because sys.stderr is unbuffered, so it forces a flush on every write,
which updates the screen, etc. (Though in this case it was re-directed
to a file).
So I wrote a patch which changes to buffer the output (attached), and
then I tried calling self.to_file.writelines(), but it turns out that
writelines() was still the number #1 single expense with:
3165 259.9056 259.9056 <method 'writelines' of 'file' objects>
So even though it was being called 100x less, it still cost almost as
much. So I thought maybe writelines was just a loop around .write(), so
I switched to write(''.join(buffer)). This helped, but far less than I
thought it would. The specific expense was:
+3500 192.1253 192.1253 +<method 'write' of 'file' objects>
So I shaved off a whole lot of calls, but I only shaved off 20% of the
overhead time.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: buffer_dots.patch
Type: text/x-patch
Size: 2887 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061102/debe5c25/attachment.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061102/debe5c25/attachment.pgp
More information about the bazaar
mailing list