[MERGE] sha_file_by_name using raw os files; -Dhashcache
John Arbash Meinel
john at arbash-meinel.com
Fri Oct 5 15:46:38 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> While profiling towards https://bugs.edge.launchpad.net/bzr/+bug/146176
> it seemed that we were double-buffering files while hashing them. This
> seems about 10% faster but it's somewhat unstable to measure. If someone
> else would like to confirm or deny it that would be useful.
>
>
If you were going to do this, why not just mmap the file?
Also, would this be a case for using O_DIRECT (I'm guessing not, but just a
thought.)
Otherwise, you are probably having another buffer in there anyway. (disk => os
=> userspace).
So you could do:
def sha_file_by_name(fname):
"""Calculate the SHA1 of a file by reading the full text"""
s = sha.new()
mem = None
f = os.open(fname, os.O_RDONLY)
try:
mem = mmap.mmap(f, 0, access=os.ACCESS_READ)
return sha.new(mem)
finally:
if mem is not None:
mem.close()
os.close(f)
I don't know if we would want to loop around sections of the mmap'd string, but
I thought the above construct would be an overall good thing (avoiding any
user-space buffers).
I'm curious whether that will be better than something like:
_have_sha1sum = True # Assume that we have it at first
def sha_file_by_name(fname):
global _have_sha1sum
if _have_sha1sum:
try:
p = subprocess.Popen(['sha1sum', fname], stdout=subprocess.PIPE)
except (IOError, OSError):
# Check for ENOENT?
_have_sha1sum = False
return sha_file_by_name(fname)
else:
val = p.communicate()
return val[:40]
else:
... # Pick your favorite in-process method
I believe 'sha' got a lot faster in python2.5 because it is using the SSL
libraries. But I don't know how that compares to 'sha1sum'. Note that I don't
think Windows has it, and I don't have it (by default) on my Mac.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHBk5OJdeBCYSNAAMRAiYVAJ9vWFteRFYJCz3SD1rPeAPZ8outpwCgubhV
IwOdJ0APLCyvzE9z+CwLuVA=
=vs3V
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list