RFC: startup time - again

John Arbash Meinel john at arbash-meinel.com
Thu Sep 11 22:03:07 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Russel Winder wrote:
> Alexander,
> 
> On Thu, 2008-09-11 at 19:37 +0300, Alexander Belchenko wrote:
> 
>> Here is the numbers for 1.6.1 with and without tests.
>>
>> Full bzrlib with tests:
>>
>> C:\Temp\bzr-1.6.1\win32_bzr.exe>timeit bzr.exe --no-plugins --no-aliases rocks
>> It sure does!
>>
>> time: 0.297
>>
>> Without tests and benchmarks:
>>
>> C:\Temp\bzr-1.6.1-wo-test\win32_bzr.exe>timeit bzr.exe --no-plugins --no-aliases rocks
>> It sure does!
>>
>> time: 0.281
>>
>> So I get about 16 ms win, i.e. >5% speedup. Does it big or small? For me it's enough big.
>> There is another 3-5 places where I can get additional 16ms speedup on each.
>> So in sum I could get about 80ms speedup. Does it big or small?
> 
> I would be strongly tempted to not make any deductions at all from two
> data points.  I find that I easily get ±10% variation for the same job
> on the same machine just running at different times.   Even if you run
> the same job on the same machine just one after the other, I bet the
> times are easily going to be ±5%.
> 

I believe he is using python's builtin "timeit" module, which runs
something N times and then does it 3 times and picks the fastest. So in
this case, it should run 3 batches of 10 times. The time reported is
sum(time) / N.

>> In the same time library.zip without tests/benchmarks is thinner: 11MB -> 7MB
>> (in python sources from 9MB of bzrlib about 4.9MB weights tests and benchmarks).
>>
>> PS: It seems like my timeit utitlity has the same bug as time.time() on Windows:
>> it has precision of 16 ms. I need to rewrite it to have more precise results.
>> I'll try to find implementation of time.clock() in Python sources. Anybody could
>> give me a hint?
> 
> On Ubuntu with Parallel Python or using the processing package (renamed
> and made standard as multiprocessing in Python 2.6) I find the
> time.clock function generally returns the same value in almost all cases
> whereas time.time returns a reasonable value.  So despite the comments
> in the manual that time.clock should be used for benchmarking, this is
> fundamentally not the case for Python on parallel systems.
> 

time.clock on Posix is the User time, time.time is wall-clock time.
However under windows time.clock is the high-precision wall-clock time.
And time.time is a 15ms resolution wall-clock time.

I generally use:

_timer = time.time
if sys.platform == 'win32':
  _timer = time.clock

And it comes up over-and-over again.

There is a claim that time.clock() should be used on Linux because it is
"user" time, aka time you can control. Though often you really do want
wall-clock because you are concerned about the impact of filesystems and
networking. But if you are timing a pure function, time.clock is always
what you want, because you don't want an accidental system blimp to mess
up your timings.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjJh4sACgkQJdeBCYSNAANEQgCfVeo87W16QffBh+jA8AmbIfVB
vqsAn3O6m1JYp29E/cJ1dyzxE+S+ZvFC
=ftTg
-----END PGP SIGNATURE-----



More information about the bazaar mailing list