weave size very sensitive to insertion order

John A Meinel john at arbash-meinel.com
Tue Feb 28 02:01:52 GMT 2006


Robert Collins wrote:
> I've been playing with the new topo sort I wrote, and have found this
> with the internal launchpad code:
> 
>    779428 2006-02-18 06:04 inventory
>  36698698 2006-02-25 16:30 inventory.backup.weave
> 101447702 2006-02-25 18:24 inventory.weave
> 
> There are 5488 revisions of inventory in the weave, and no unreferenced
> inventories. 168 of them are missing parents that were ghosts when they
> were first added.
> 
> The weave blows up to 3 times its size after the conciliation - I've
> checked that the graph is right: there are only 168 different entries
> between the two graphs and a random sample of those does indeed have
> wrong parents in the old weave and correct ones in the new weave.
> 
> 
> So insertion order appears to have a dramatic effect on the weave size.
> 
> I'm not sure how to make a minimal test of this, but as its only
> inventory data I may be able to make a copy of these files if people
> wish to analyze this - I'm going to check with the lp team leader about
> that.
> 
> What I am seeing is a sort of toggle between inventory lines in the
> weave for a known file:
> 
> { x
> COPYING ... @ rev A
> ....
> } x
> { y
> COPYING ... @ rev B
> ....
> }
> { z
> COPYING ... @ rev A
> }
> 
> So its almost like the diff against the union-of-active-lines failed
> some number of times and gave inappropriate full-insertions or something
> like that.
> 
> Rob

It depends how topo_sort works. If you have an ancestry of:

A   B
|   |
C   D
|   |
E   F
 \ / \
  G   H

There are 2 obvious sortings:
A,B,C,D,E,F,G
and
A,C,E,B,D,F,G

I would guess that the former would cause difficulties with weave, but I
couldn't guarantee that.

I am a decent hand at reading weaves, so I might be interested in
comparing them.

Oh... And the other thing that would cause lots of problems is having an
ancestry like above, because A & B are both children of the 'null:'
revision, which means they can't share any lines. So even if the
contents are identical, you would end up with duplicated lines. And then
if you merge G, it would have to pick which set to delete. Which would
mean that H might be using the lines that G deleted, which means that
when it is merged (in the future), the lines would have to be deleted again.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060227/f8ee795c/attachment.pgp 


More information about the bazaar mailing list