weave size very sensitive to insertion order

John A Meinel john at arbash-meinel.com
Tue Feb 28 02:37:29 GMT 2006


Robert Collins wrote:
> On Mon, 2006-02-27 at 20:01 -0600, John A Meinel wrote:
>> Robert Collins wrote:
> 
> 
>> It depends how topo_sort works. If you have an ancestry of:
>>
>> A   B
>> |   |
>> C   D
>> |   |
>> E   F
>>  \ / \
>>   G   H
>>
>> There are 2 obvious sortings:
>> A,B,C,D,E,F,G
>> and
>> A,C,E,B,D,F,G
> 
> The current bzr - the one producing a 30Mb weave - generates ABCDEFGH
> The one I found this with - the subsecond topo_sort - *can* generate
> ABCDEFGH but is more likely to generate something like ACEBDFGH or
> BDFHACEG or even ACBDFHEG depending on what nodes form the starting
> position.
> 
>> I would guess that the former would cause difficulties with weave, but I
>> couldn't guarantee that.
> 
> As it happens, the former generates the smaller weave :).
> 
>> I am a decent hand at reading weaves, so I might be interested in
>> comparing them.
> 
> ok. Shall see what can be done. For interest there is only one root
> here:
>>>> roots = [(name, i.parent_names(name)) for name in i.names() if
> i.parent_names(name) == []]
>>>> print roots
> [('Arch-1:rocketfuel at canonical.com%soyuz--devel--0--base-0', [])]
> 
>> Oh... And the other thing that would cause lots of problems is having an
>> ancestry like above, because A & B are both children of the 'null:'
>> revision, which means they can't share any lines. So even if the
>> contents are identical, you would end up with duplicated lines. And then
>> if you merge G, it would have to pick which set to delete. Which would
>> mean that H might be using the lines that G deleted, which means that
>> when it is merged (in the future), the lines would have to be deleted again.
> 
> Hmm, I don't see that as being a requirement of a weave per se - theres
> no reason that revisions with differing heritage cannot share lines, as
> long as appropriate control instructions are there. I know our inserter
> does not try for that, but its feasible.
> 
> Rob
> 

Codeville weave does share lines. And also allows resurrecting lines.
Basically, it has a line set, which it diffs against, and then a map of
actions (in rev 1 this line 2 was activated, in rev 2 it was deleted, in
rev 3 it was resurrected, etc).

Now, his set of lines wasn't perfect. Because he didn't handle moving of
lines, if you had the text:

A
B
C

Then you had the text

A
C
B

It would be stored as

A
B
C
B

And it further has the 'bug' that because it uses patience diff, even
though B is only active in 1 location in either of the texts, it will be
treated as a duplicate line.

In my mind, it is a basic problem with weaves that use the physical
location of the line to determine where it is relative to the other
lines. (As opposed to a format which would assign unique identifiers to
the line contents, and then say 'C follows B' in the first case and 'B
follows C' in the latter. Though it would also need a way to handle
genuinely duplicated lines).

But you are right, resurrecting or sharing lines is allowed by a
weave-like object. But Martin's implementation forbids them. (A line is
annotated with a single revision, and is active if that revision is in
the ancestry, and *none* of the ancestry deleted it.)

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060227/4f40456f/attachment.pgp 


More information about the bazaar mailing list