Rev 4178: Further understanding shows just how unlikely a collision is, at 8 bytes and 100M keys. in http://bzr.arbash-meinel.com/branches/bzr/jam-integration

John Arbash Meinel john at arbash-meinel.com
Fri Mar 20 21:19:59 GMT 2009


At http://bzr.arbash-meinel.com/branches/bzr/jam-integration

------------------------------------------------------------
revno: 4178
revision-id: john at arbash-meinel.com-20090320211941-680n5r03r6eb413b
parent: john at arbash-meinel.com-20090320211501-pvg206m44mwckrec
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: jam-integration
timestamp: Fri 2009-03-20 16:19:41 -0500
message:
  Further understanding shows just how unlikely a collision is, at 8 bytes and 100M keys.
  
  So next I need to revise the recommended size of various objects.
  The mini-index size should be tuned to be reasonable versus the
  expected number of bytes that need to be read to resolve the
  reference. And shrinking the keys changes the width of the bytes,
  and effects the chance of collision.
  Very likely we can write an index that is <10 bytes per entry.
-------------- next part --------------
=== modified file 'doc/developers/improved_chk_index.txt'
--- a/doc/developers/improved_chk_index.txt	2009-03-20 21:15:01 +0000
+++ b/doc/developers/improved_chk_index.txt	2009-03-20 21:19:41 +0000
@@ -258,5 +258,10 @@
 collision chance of 10^-3, and number of keys 10^6, means we need (12+3)*0.4 =
 6 bytes.
 
+Also taking one more look at ``H ~ 0.4 (2N + E)``, you can rearrange and
+consider that for every order of magnitude more keys you insert, your chance
+for collision goes up by 2 orders of magnitude. But for 100M keys, 8 bytes
+gives you a 1 in 10,000 chance of collision.
+
 .. 
   vim: ft=rst tw=78 ai



More information about the bazaar-commits mailing list