[MERGE] simple performance improvement

John Arbash Meinel john at arbash-meinel.com
Wed Jan 31 17:49:47 GMT 2007


It seems that the construct:

for segment in contents:
  f.write(segment)

is more expensive than just calling
f.writelines(segment)

The --lsprof difference is:
    484 3.080 2.255 bzrlib.transform:253(create_file)
+147651 0.511 0.511 +<method 'write' of 'file' objects>
   +484 0.276 0.276 +<method 'close' of 'file' objects>

versus

    484 2.486 2.327 bzrlib.transform:253(create_file)
   +484 0.078 0.078 +<method 'close' of 'file' objects>
   +484 0.042 0.042 +<method 'writelines' of 'file' objects>

In a real-world test of 'bzr checkout --lightweight bzr.dev test' it
changes the cpu time from 4.29 (+-0.05) down to 4.21 (+-0.07).

This isn't revolutionary, but it is an improvement, and the change to
the code is minor. So I'd like to get it merged.

I also think the improvement will scale with larger datasets. And it
helps keep 'write()' from looking like a performance issue.

Counter intuitively, I have evidence that changing the line:

f = open(name, 'wb')

to

def do_open():
  return open(name, 'wb')
f = do_open()

Actually shaves off another 20ms. (down to 4.19) This is averaged over
15 runs. So while it isn't a huge dataset, it isn't like I just ran it a
couple times.

I don't really know how to respond to that, I only did it because
--lsprof doesn't show the time spent in open(), but it does show the
time spent in a nested function like do_open().

John
=:->
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: use_writelines.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070131/3f3ee08c/attachment-0001.diff 


More information about the bazaar mailing list