Python string concatenation memory and time measurements

Ian Clatworthy ian.clatworthy at internode.on.net
Mon May 12 08:04:56 BST 2008


Ian Clatworthy wrote:
> Forest Bond wrote:
> 
>> Maybe this is helpful:
>>
>> http://www.skymind.com/~ocrow/python_string/
> 
> Yes thank-you.
> 
> I probably need to do something similar in terms of testing but using a
> small number of really big strings instead of a large number of really
> small ones.

See attached for a test program that tries the 3 methods of reading in a
file line-by-line, and producing the content as one big string. The 3
methods are:

1. put lines into a list & ''.join(list)
2. using result += line
3. using cStringIO and getvalue

Maybe my code is wrong but I get method 2 coming out as the best in
terms of speed and memory consumption, time and time again, for both
Python 2.5 and Python 2.4. Can someone please double check the code and
confirm the results on their platform?

Here are some sample results on:

* a large text file - bzr.dev/NEWS (237K)
* a large binary file - producingoss-1.753.tar.gz (10.9M)

$ python2.4 stringtest.py ~/bzr/repo/bzr.dev/NEWS 1
method 1
time 0.010 s
output size 0.231 MB
process size 4.820 MB

$ python2.4 stringtest.py ~/bzr/repo/bzr.dev/NEWS 2
method 2
time 0.010 s
output size 0.231 MB
process size 4.312 MB

$ python2.4 stringtest.py ~/bzr/repo/bzr.dev/NEWS 3
method 3
time 0.010 s
output size 0.231 MB
process size 4.566 MB

$ python stringtest.py ~/bzr/repo/bzr.dev/NEWS 1
method 1
time 0.010 s
output size 0.231 MB
process size 5.031 MB

$ python stringtest.py ~/bzr/repo/bzr.dev/NEWS 2
method 2
time 0.010 s
output size 0.231 MB
process size 4.402 MB

$ python stringtest.py ~/bzr/repo/bzr.dev/NEWS 3
method 3
time 0.010 s
output size 0.231 MB
process size 4.730 MB

$ python stringtest.py
/home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 1
method 1
time 0.200 s
output size 10.859 MB
process size 27.824 MB

$ python stringtest.py
/home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 2
method 2
time 0.170 s
output size 10.859 MB
process size 15.109 MB

$ python stringtest.py
/home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 3
method 3
time 0.210 s
output size 10.859 MB
process size 32.977 MB

$ python2.4 stringtest.py
/home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 1
method 1
time 0.250 s
output size 10.859 MB
process size 27.613 MB

$ python2.4 stringtest.py
/home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 2
method 2
time 0.160 s
output size 10.859 MB
process size 14.938 MB

$ python2.4 stringtest.py
/home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 3
method 3
time 0.210 s
output size 10.859 MB
process size 32.691 MB

*If* my code is correct, then that implies my recent change to
fastimport - replacing ''.join(lines) with result += line - is indeed
the right thing to do. Maybe it also implies we ought to rethink how we
handle "lines" in really large files inside Bazaar?

Ian C.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringtest.py
Type: text/x-python
Size: 2291 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080512/e438ee5b/attachment.py 


More information about the bazaar mailing list