Rev 4178: Further understanding shows just how unlikely a collision is, at 8 bytes and 100M keys. in http://bzr.arbash-meinel.com/branches/bzr/jam-integration
John Arbash Meinel
john at arbash-meinel.com
Fri Mar 20 21:19:59 GMT 2009
At http://bzr.arbash-meinel.com/branches/bzr/jam-integration
------------------------------------------------------------
revno: 4178
revision-id: john at arbash-meinel.com-20090320211941-680n5r03r6eb413b
parent: john at arbash-meinel.com-20090320211501-pvg206m44mwckrec
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: jam-integration
timestamp: Fri 2009-03-20 16:19:41 -0500
message:
Further understanding shows just how unlikely a collision is, at 8 bytes and 100M keys.
So next I need to revise the recommended size of various objects.
The mini-index size should be tuned to be reasonable versus the
expected number of bytes that need to be read to resolve the
reference. And shrinking the keys changes the width of the bytes,
and effects the chance of collision.
Very likely we can write an index that is <10 bytes per entry.
-------------- next part --------------
=== modified file 'doc/developers/improved_chk_index.txt'
--- a/doc/developers/improved_chk_index.txt 2009-03-20 21:15:01 +0000
+++ b/doc/developers/improved_chk_index.txt 2009-03-20 21:19:41 +0000
@@ -258,5 +258,10 @@
collision chance of 10^-3, and number of keys 10^6, means we need (12+3)*0.4 =
6 bytes.
+Also taking one more look at ``H ~ 0.4 (2N + E)``, you can rearrange and
+consider that for every order of magnitude more keys you insert, your chance
+for collision goes up by 2 orders of magnitude. But for 100M keys, 8 bytes
+gives you a 1 in 10,000 chance of collision.
+
..
vim: ft=rst tw=78 ai
More information about the bazaar-commits
mailing list