Python string concatenation memory and time measurements

Mon May 12 08:53:50 BST 2008

2008/5/12 Ian Clatworthy <ian.clatworthy at internode.on.net>:
> Ian Clatworthy wrote:
>> Forest Bond wrote:
>>
>>> Maybe this is helpful:
>>>
>>> http://www.skymind.com/~ocrow/python_string/
>>
>> Yes thank-you.
>>
>> I probably need to do something similar in terms of testing but using a
>> small number of really big strings instead of a large number of really
>> small ones.
>
> See attached for a test program that tries the 3 methods of reading in a
> file line-by-line, and producing the content as one big string. The 3
> methods are:
>
> 1. put lines into a list & ''.join(list)
> 2. using result += line
> 3. using cStringIO and getvalue
>
> Maybe my code is wrong but I get method 2 coming out as the best in
> terms of speed and memory consumption, time and time again, for both
> Python 2.5 and Python 2.4. Can someone please double check the code and
> confirm the results on their platform?
>
> Here are some sample results on:
>
> * a large text file - bzr.dev/NEWS (237K)
> * a large binary file - producingoss-1.753.tar.gz (10.9M)
>
> $ python2.4 stringtest.py ~/bzr/repo/bzr.dev/NEWS 1
> method 1
> time 0.010 s
> output size 0.231 MB
> process size 4.820 MB
>
> $ python2.4 stringtest.py ~/bzr/repo/bzr.dev/NEWS 2
> method 2
> time 0.010 s
> output size 0.231 MB
> process size 4.312 MB
>
> $ python2.4 stringtest.py ~/bzr/repo/bzr.dev/NEWS 3
> method 3
> time 0.010 s
> output size 0.231 MB
> process size 4.566 MB
>
> $ python stringtest.py ~/bzr/repo/bzr.dev/NEWS 1
> method 1
> time 0.010 s
> output size 0.231 MB
> process size 5.031 MB
>
> $ python stringtest.py ~/bzr/repo/bzr.dev/NEWS 2
> method 2
> time 0.010 s
> output size 0.231 MB
> process size 4.402 MB
>
> $ python stringtest.py ~/bzr/repo/bzr.dev/NEWS 3
> method 3
> time 0.010 s
> output size 0.231 MB
> process size 4.730 MB
>
> $ python stringtest.py
> /home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 1
> method 1
> time 0.200 s
> output size 10.859 MB
> process size 27.824 MB
>
> $ python stringtest.py
> /home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 2
> method 2
> time 0.170 s
> output size 10.859 MB
> process size 15.109 MB
>
> $ python stringtest.py
> /home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 3
> method 3
> time 0.210 s
> output size 10.859 MB
> process size 32.977 MB
>
> $ python2.4 stringtest.py
> /home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 1
> method 1
> time 0.250 s
> output size 10.859 MB
> process size 27.613 MB
>
> $ python2.4 stringtest.py
> /home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 2
> method 2
> time 0.160 s
> output size 10.859 MB
> process size 14.938 MB
>
> $ python2.4 stringtest.py
> /home/ian/Desktop/Downloads/producingoss-1.753.tar.gz 3
> method 3
> time 0.210 s
> output size 10.859 MB
> process size 32.691 MB
>
> *If* my code is correct, then that implies my recent change to
> fastimport - replacing ''.join(lines) with result += line - is indeed
> the right thing to do. Maybe it also implies we ought to rethink how we
> handle "lines" in really large files inside Bazaar?