Bloom Filters

John Arbash Meinel john at arbash-meinel.com
Tue Mar 27 19:22:29 BST 2007


John Arbash Meinel wrote:
> John Arbash Meinel wrote:
>> Aaron Bentley wrote:
>> ...
>>
>>> A bloom filter could be used there.  If you are only doing a "sampling",
>>> you may be able to get acceptable error rates with less than 500K.
>>>
>>> Aaron

Oh, and we had a discussion about whether you could 'invert' a bloom
filter, so that you got a false negative rate, rather than a false
positive rate. And it seems that Wikipedia provides a possibility with a
'generalized bloom filter':
http://en.wikipedia.org/wiki/Bloom_filter

Though I don't see it saying you can do the exact opposite (have 0 FPR
and some satisfactory FNR). You can at least increase your density if
you are willing to allow some FPR.

Basically, you just have some of the hashes *unset* bits, and some *set*
bits. I would guess that the tradeoff is a poor one for our use case,
though. (A FNR means you may never find a revision that you need to pull).

John
=:->




More information about the bazaar mailing list