[MERGE] documentation from the london sprint
John Arbash Meinel
john at arbash-meinel.com
Mon Jun 4 15:33:29 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> Robert Collins wrote:
>> This is documentation from the london sprint on add, annotate, gc and
>> revert as well as some extra push-pull notes.
>
> Oh, did you think this stuff needs discussion? I considered it trivial.
> Maybe there are ways to improve the document, but as long as it's a true
> rendition of what we discussed in London, I'd consider it mergeable.
>
> One thing is that these texts seem a lot wider than our 79 char
> standard. Another is that you spelled siblings "siblins" in one place.
>
> Also, there are some RST formatting errors:
> $ rst2html performance-roadmap.txt > perf.html
> incremental-push-pull.txt:251: (ERROR/3) Unexpected indentation.
> incremental-push-pull.txt:262: (ERROR/3) Unexpected indentation.
> performance-roadmap.txt:580: (WARNING/2) Inline strong start-string
> without end-string.
> add.txt:32: (ERROR/3) Unexpected indentation.
> performance-roadmap.txt:628: (WARNING/2) Block quote ends without a
> blank line; unexpected unindent.
>
> In particular, your ordered list formatting isn't right. For ReST, it
> should be:
>
> #. First item
> #. Next item
>
> etc.
>
>> +If the annotation works by asking the storage layer for successive deltas
>> +within the history of the thing being annotated we believe we can make it scale
>> +broadly proportional to the depth of the tree of revisions of the annotated
>> +object.
>
>> + * Perhaps multiparent deltas would allow us to not store the cached
>> + annotations in each delta without losing performance or accuracy.
>
> I think multiparent does give you that. I think asking for
> non-multiparent deltas doesn't.
>
> The algorithm I have for reconstructing texts in multiparent is always
> aware what revision it's pulling lines from. But that means we continue
> to pun annotation storage with fulltext storage, which could be a loss.
>
> One gain though, is that you can potentially use deltas in the sequence
> matching used by annotate-merge. That would scale with the number of
> revisions and the size of the deltas, which is better than scaling with
> the square of the size of the text.
I just want to make a quick comment here about my results with testing xdelta.
I'm still finishing up a summary document (trying to write a summary takes a
lot of effort to get the important info out).
But anyway, going by ancestry allows a much better delta compression. Ignoring
all the other permutations I've tested, it is something like:
Compressing parents against their children:
280,676 bytes (1 full=40,956, 1288 delta=239,720)
Compressing children against their parents:
384,440 (4 full=51,394, 1285 delta=333,046)
Sorting by size of text, and compressing shorter texts against longer
540,134 (1 full=40,956, 1288 delta=499,178)
Just compressing based on the topological sorting that is present in the knit:
614,800 (1 full=37,897, 1288 delta=576,903)
Notice that there is more than a 2:1 improvement in compression by going
directly on ancestry, and about a 50% improvement going from parents to
children, rather than the other way around.
Also, it is fun to note that this was 1289 texts with a total fulltext size of
120MB. (This is the builtins.py knit).
Looking at it, if we go by ancestry, we can actually do better than git's
packed storage by around 2:1. (Though git has cross-file compression which we
would need to think about).
>
>> + 1. get 'new enough' hash data for the siblins of the scope: it can be out of date as long as its not older than the last move or rename out of that siblings scope.
>
> I don't really understand this bit about "new enough" sibling data.
>
> Aaron
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGZCK5JdeBCYSNAAMRAqHtAJ9jBxuTe9QgXepO1m8NgbDGMuqfBQCgyUuO
KSKdNKSZczXy67dArrdeaPo=
=Hd6B
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list