some more indexing thoughts..
Robert Collins
robertc at robertcollins.net
Tue Jul 17 10:48:55 BST 2007
On Mon, 2007-07-16 at 13:39 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > - whats the win by having a topologically sorted revision graph index
> > e.g. make finding a revid a linear scan then have topological data from
> > there on in. This would give great locality of reference for 'recent'
> > data in any index for determining merge bases and the like.
>
> One side-effect of such an index, (compared with one that provides fast
> random access to any revision) is that it becomes fairly expensive to
> determine that a revision is not present in a repository.
>
> Perhaps a bloom filter would help there?
Indeed. I think it is safe behind the API now; so you could branch my
index patch and try one if you wanted ;).
As you know I'm working on the changes to the repo to get rid of our
many independent indices, I think I can halve the IO on commit quite
safely this week. I'm introducing a new API break though:
def abort_write_group(self):
"""Commit the contents accrued within the current write group.
:seealso: start_write_group.
"""
def is_in_write_group(self):
"""Return True if there is an open write group.
:seealso: start_write_group.
"""
def commit_write_group(self):
"""Commit the contents accrued within the current write group.
:seealso: start_write_group.
def start_write_group(self):
"""Start a write group in the repository.
Write groups are used by repositories which do not have a 1:1 mapping
between file ids and backend store to manage the insertion of data from
both fetch and commit operations.
A write lock is required around the start_write_group/commit_write_group
for the support of lock-requiring repository formats.
"""
The idea is to provide semantically clear hooks for index serialisation
- this fixes bugs where unlock triggers large writes unexpectedly, and
allows for progress bars or other things as desired; and allows for
cleanup on failure of partily-written indices etc.
Its possible that start_write_group should return an object, like commit
builder, which allows insertion of data into the repository, but at the
moment my focus is on the deep refactoring - we can introduce such an
object later if desired; but introducing it now won't help me reach
performance goals and will require work to deliver a good api for
plugins like bzr-svn.
Cheers,
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070717/918c105c/attachment.pgp
More information about the bazaar
mailing list