John Arbash Meinel
john at arbash-meinel.com
Tue Mar 27 19:22:29 BST 2007
John Arbash Meinel wrote:
> John Arbash Meinel wrote:
>> Aaron Bentley wrote:
>>> A bloom filter could be used there. If you are only doing a "sampling",
>>> you may be able to get acceptable error rates with less than 500K.
Oh, and we had a discussion about whether you could 'invert' a bloom
filter, so that you got a false negative rate, rather than a false
positive rate. And it seems that Wikipedia provides a possibility with a
'generalized bloom filter':
Though I don't see it saying you can do the exact opposite (have 0 FPR
and some satisfactory FNR). You can at least increase your density if
you are willing to allow some FPR.
Basically, you just have some of the hashes *unset* bits, and some *set*
bits. I would guess that the tradeoff is a poor one for our use case,
though. (A FNR means you may never find a revision that you need to pull).
More information about the bazaar