Rev 3896: Reverted back to the same hash width, and bumped EXTRA_NULLS to 3. in http://bzr.arbash-meinel.com/branches/bzr/brisbane/gc_delta_index_room

John Arbash Meinel john at arbash-meinel.com
Wed Mar 18 22:45:30 GMT 2009


At http://bzr.arbash-meinel.com/branches/bzr/brisbane/gc_delta_index_room

------------------------------------------------------------
revno: 3896
revision-id: john at arbash-meinel.com-20090318224524-ve32it3ddqfzvi6q
parent: john at arbash-meinel.com-20090318224055-hs7nxyq5fjz0pr19
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: gc_delta_index_room
timestamp: Wed 2009-03-18 17:45:24 -0500
message:
  Reverted back to the same hash width, and bumped EXTRA_NULLS to 3.
  Most entries in a hash bucket are genuinely random, so they don't trigger
  extra comparisons. So walking 4-7 nodes is fairly cheap at that level.
  My guess is that bumping EXTRA_NULL has a bigger effect when you get the
  occassional non-random data, that forces expansion because it gets a
  collision.
  Data with repetition a multiple of 16 (but not 16) will cause this, as
  you can get a large insertion, with lots of dupes.
  We filter out when the dupe is exactly a multiple of 16, we may want to
  do something similar at larger ranges (or use limit_hash_table on the data
  possibly with a much smaller value than 64.)
  Most important (next) is to handle the large update case.
-------------- next part --------------
=== modified file 'bzrlib/diff-delta.c'
--- a/bzrlib/diff-delta.c	2009-03-18 22:40:55 +0000
+++ b/bzrlib/diff-delta.c	2009-03-18 22:45:24 +0000
@@ -28,7 +28,7 @@
 /* The hash map is sized to put 4 entries per bucket, this gives us 2 blank
  * spaces.
  */
-#define EXTRA_NULLS 1
+#define EXTRA_NULLS 3
 
 static const unsigned int T[256] = {
     0x00000000, 0xab59b4d1, 0x56b369a2, 0xfdeadd73, 0x063f6795, 0xad66d344,
@@ -447,7 +447,7 @@
         total_num_entries = num_entries + old->num_entries;
     else
         total_num_entries = num_entries;
-    hsize = total_num_entries / 2;
+    hsize = total_num_entries / 4;
     for (i = 4; (1u << i) < hsize && i < 31; i++);
     hsize = 1 << i;
     hmask = hsize - 1;
@@ -531,7 +531,7 @@
        first byte to allow for optimizing the Rabin's polynomial
        initialization in create_delta(). */
     total_num_entries = num_entries + old_index->num_entries;
-    hsize = total_num_entries / 2;
+    hsize = total_num_entries / 4;
     for (i = 4; (1u << i) < hsize && i < 31; i++);
     hsize = 1 << i;
     if (hsize < old_index->hash_mask) {



More information about the bazaar-commits mailing list