[MERGE][1.0] update C PatienceDiff to allow for non-strings

John Arbash Meinel john at arbash-meinel.com
Wed Dec 5 15:28:09 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> John Arbash Meinel wrote:
> [...]
>> I don't know if the strings have already cached their hashes, or just if the
>> python hash function is just that much faster to compute. I'm wondering if it
>> computes it by long, rather than by character...
> 
> I believe string objects in Python cache their hash value.
> 
> -Andrew.
> 
> 

True, but for most of these this should be the first time PyObject_Hash is
called. hmm... but maybe not for my benchmark code... which makes it biased...
Dang. I'll rewrite it to not re-use the same text strings.

Thanks for the reminder. I do believe that in real-world situations we should
expect that the strings have not had their hashes calculated already.
(sometimes they will, but not always.)

If I change the test script so that it does:
text = [l[:10] + l[10:] for l in text]

it forces new strings to be generated, and gives these results:

  bzr.dev:	3.7s -> 4.1s
  hash(obj):	4.9s -> 5.1s

So djb2 is faster to compute than hash(str). Now, when extracting 2 texts from
the repository, it is likely that it will get them with common strings. Which
makes "hash()" better again. However, if one is a disk file (probably the most
common) you won't have common lines.

If I just switch it to use PyObject_Compare() and use djb2 for the hash of
strings, I get:	3.6s -> 3.9s

So if we expect that most strings will be unique, it is reasonable to use djb2
as the hash function. The big change is that it isn't much faster, but it does
support arbitrary objects.

So anyway, I have an updated patch, but it is certainly less critical now. It
will allow arbitrary objects, but the performance is pretty equal. (I do wonder
if PyObject_Hash isn't the right way to go though. As there will be times when
the hash can already be computed, or help in future comparisons.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHVsOJJdeBCYSNAAMRAvisAJ4xNbeU3NiPy/MtBhv5wIcl+otbkgCeOTP3
ckrjQza6MDxUriwrpEiUAUo=
=hfW+
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: patience_tuples.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20071205/47c9e741/attachment-0001.diff 


More information about the bazaar mailing list