Optimizing tree building

Tue Jun 19 07:14:50 BST 2007

On 6/19/07, John Arbash Meinel <john at arbash-meinel.com> wrote:
> I think Robert and I have settled our differences. At least for now I
> think we are going with:
>
> 1) Have a foo_py.py file next to foo_c.pyx file. And the two of them
> should have the exact same functionality. Some of what I've done had the
> python functionality mixed with a different file, and the other way I
> tried had the .pyx in a completely different directory.

+1 and let me check if this is also correct - all code that uses these
modules should do

  try:
    import foo_c as foo
  except ImportError:
    import foo_py as foo

I wonder if we will ever have trouble where there is, say, an out of
date foo_c installed system wide and we should just be using the
foo_py version...

It might be good to add something about these points to HACKING.

> 2) I felt like most people agreed not to version the .c files, so that
> is how I would do it. I still don't feel like it was definitively
> answered, but that was my 'feeling' about the responses.

I think that was the consensus and I agree.

> > Right, and I guess we're already scaling badly at that point, so the
> > indices hardly make a difference.
> >
> > Aaron
>
> There is also the issue of "discoverability". To this date, we haven't
> needed a '.listing' file, because we have everything accessible by known
> paths. And we have done it in a way that all files are append-only or
> atomic replace. (The only current atomic replace file I'm aware of is
> .bzr/branch/(revision_history|last_revision_info)).
>
> Having an index of indicies seems like something that could quickly
> scale very poorly. I know one thing we had hoped for is a lock-free
> writers. So you can insert new data into the repository without locking.
> Or at least without locking for a long time. I think Robert made a
> comment about spin-locking on the master index. So you try to write,
> then read, and if it isn't what you wrote, you update it with whatever
> info, and write again, until it finally stabilizes. If you have 1000
> people committing at the same time, it gets a little bit ugly, but
> theoretically it does eventually converge.

It's actually: write it, read it back, then read the list of files in
the directory and make sure it's up to date.  This should converge and
be lock free, though it may be a bit more complex on Windows where we
can't rename the file if it's being written or read, so it needs some
careful thought about that.

Every filesystem we can write to lets us list directories so it's
reasonable for writers to update this.

> Oh, and one more thing to think about. If we have 1M keys in a file you
> need a bloom filter of approximately 1MB to have a ~2% false positive
> rate. Which in some ways seems pretty big. However, if you look at
> bzr.dev *right now*, inventory.kndx is 1.3MB for 14,000 revisions.
>
> And if you just look at a file which list only revision ids, you have
> about 60bytes per revid, or 60MB for 1M revisions.
>
> Bloom filters generally scale as O(N), their nice property is that their
> constant is really small (1byte per N). I don't know how to do it as
> O(log(N)) which is what you would really want for a master index.
>
> Also note that because blooms are O(N) splitting up 1M revisions into 10
> files with 100k revisions still gives you 1MB of bloom data.
>
> Which is something that a "tree" structure (trie, something else) helps
> with. Because the top node gives you part of the tree, and then you read
> further. We are doing a bit of a tree with bloom filters, in that the
> top node is a bloom filter, which helps you figure out what sub-node to
> read. But the top node is of size O(N), and I don't think you can change
> that with Blooms.

Right, and (I think) we're always going to have to read the whole
filter - there is not good locality within it, as we might hope to
have with a sorted index.

-- 
Martin