revfile developments
Matt Mackall
mpm at selenic.com
Sun Apr 10 05:12:09 BST 2005
On Sun, Apr 10, 2005 at 09:55:01AM +1000, Martin Pool wrote:
> Hi Matt,
>
> Thanks for letting me use your revfile code.
>
> Here are the changes I made:
>
> http://bazaar-ng.org/bzr/bzr.revfile/bzrlib/mdiff.py
> http://bazaar-ng.org/bzr/bzr.revfile/bzrlib/revfile.py
>
> At first I tried doing a byte-by-byte diff, but that turns out to be too
> slow, as you probably know. I fixed a bug in the linesplit()
> function.
Yeah, found that too. I did a checkout from bkcvs of the ~500 Makefile revs
and checked them in and discovered a few things. Checking a revision into
revfile is about 100 times faster than checking a revision out of CVS.
> There are two small optimizations to avoid storing a diff or avoid doing
> gzip if they wouldn't win.
I was planning to replace factor with something that basically ensures
that the data needed to reconstruct a rev is never more than say 2x
the length of the original file.
> I think it's important to be able to have branching within the storage
> of a single file, so I added that.
Ok, I'll look at that. I don't think it's necessary though.
> Although I index by SHA-1, I don't make the mistake of Monotone of
> assuming that two objects with the same content are the same thing.
> There is a higher-level inventory and revision object that just uses
> revfile as a content-addressible store. I need to use something more
> than just an integer to identify revisions because it's too hard to keep
> simple integers in sync in a distributed system. A nice side effect is
> that we can easily check we're getting out the text we meant to put in.
Haven't convinced myself of what's needed here. See my notes I just
sent out. My hope is to have everything but branch-ids and
changeset-ids be local.
Handling rename is a little annoying and suggests needing UUIDs per
file (but not per revision), but that might be dealt with by simply
having each changeset point to (or be) a pointer to the toplevel
directory revision and that recursively references all the changes.
> Each delta has a "base" pointer saying which previous text it's stored
> relative to. The base pointer doesn't have any meaning to the revision
> control layer; it's just for delta compression. This could be
> manipulated to do some kind of skip-deltas to avoid ever needing to
> store the full text, but I don't do that for now.
Yeah, there are a bunch of things that can be done here.
--
Mathematics is the supreme nostalgia of our time.
More information about the bazaar
mailing list