sydney mini-sprint, kickstarting 0.16, roadmap for 0.16

Tue Mar 27 17:01:23 BST 2007

Aaron Bentley wrote:
...

> 
> A bloom filter could be used there.  If you are only doing a "sampling",
> you may be able to get acceptable error rates with less than 500K.
> 
> Aaron

Sure. I actually wrote a bloom filter plugin for playing around with it.
It hooks into the bzr test suite, but doesn't do anything for bzr. It is
a re-implementation of a 'pybloom' package. But I think that at this
point, I don't think there is any common code.

It gave me some experience with it, which says that they work as
advertised, and can give pretty good hit/miss rates. But ultimately they
don't take ancestry into account. Which is good and bad depending on the
use case. If we have any sort of storage that doesn't include all of
history, then we don't want it to report false success. But while we are
maintaining the "if I have revision X, I have all of its ancestors",
there is a lot of information you can extract from the graph.

Ultimately I wanted to compare them to a trie for a potential "packfile"
format for bzr. Reading just a kilobyte or so of the beginning of a
bunch of packfiles for a quick "hit/miss" check could be useful.

John
=:->