some more indexing thoughts..

Robert Collins robertc at robertcollins.net
Tue Jul 17 10:48:55 BST 2007


On Mon, 2007-07-16 at 13:39 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> >  - whats the win by having a topologically sorted revision graph index
> > e.g. make finding a revid a linear scan then have topological data from
> > there on in. This would give great locality of reference for 'recent'
> > data in any index for determining merge bases and the like.
> 
> One side-effect of such an index, (compared with one that provides fast
> random access to any revision) is that it becomes fairly expensive to
> determine that a revision is not present in a repository.
> 
> Perhaps a bloom filter would help there?

Indeed. I think it is safe behind the API now; so you could branch my
index patch and try one if you wanted ;).

As you know I'm working on the changes to the repo to get rid of our
many independent indices, I think I can halve the IO on commit quite
safely this week. I'm introducing a new API break though:

def abort_write_group(self):
    """Commit the contents accrued within the current write group.

    :seealso: start_write_group.
    """

def is_in_write_group(self):
    """Return True if there is an open write group.

    :seealso: start_write_group.
    """

def commit_write_group(self):
    """Commit the contents accrued within the current write group.

    :seealso: start_write_group.

def start_write_group(self):
    """Start a write group in the repository.

    Write groups are used by repositories which do not have a 1:1 mapping
    between file ids and backend store to manage the insertion of data from
    both fetch and commit operations.

    A write lock is required around the start_write_group/commit_write_group
    for the support of lock-requiring repository formats.
    """

The idea is to provide semantically clear hooks for index serialisation
- this fixes bugs where unlock triggers large writes unexpectedly, and
allows for progress bars or other things as desired; and allows for
cleanup on failure of partily-written indices etc.

Its possible that start_write_group should return an object, like commit
builder, which allows insertion of data into the repository, but at the
moment my focus is on the deep refactoring - we can introduce such an
object later if desired; but introducing it now won't help me reach
performance goals and will require work to deliver a good api for
plugins like bzr-svn.

Cheers,
Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070717/918c105c/attachment.pgp 


More information about the bazaar mailing list