deleting InterDiffferingSerializer

John Arbash Meinel john at arbash-meinel.com
Tue May 5 04:31:35 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2009-05-04 at 10:17 -0500, John Arbash Meinel wrote:
> 
>> There are many things that InterDifferingSerializer gives us that
>> streaming fetch doesn't yet. If we can get streaming fetch to do it,
>> then I'm fine getting rid of it.
>>
>> 1) Incremental updates. IDS converts batches of 100 revs at a time,
>> which also triggers autopacks at 1k revs. Streaming fetch is currently
>> an all-or-nothing, which isn't appropriate (IMO) for conversions.
>> Consider that conversion can take *days*, it is important to have
>> something that can be stopped and resumed.
> 
> OTOH the way we convert is:
> do many groups
> do a full repack at the end
> 
> If streaming issued data in the right order we could go straight into
> the final pack.

I don't think we can predict what the "right order" will be for the CHK
pages without actually creating them. You are probably correct for the
file texts case.

...

>> 3) "delta based", this is another rather big issue. IIRC Streaming Fetch
>> will send whole-inventories. 
> 
> Andrew and I have a patch to send inventory deltas, its incomplete
> pending fixing the rather severe stacking-streaming issues.

"incomplete pending fixing" is a reason not to delete IDS yet. I'm happy
to re-evaluate that when you land this.

> 
>> 4) Computing rich root information from the 'stream'. IIRC the
>> "Streaming" code does yet-another pass over the inventory data (I think
>> that makes 3) to determine the root-id and whether it has changed or not.
> 
> IDS does this too, and the delta based stream won't need to do that.
> 

IDS does not. The streaming code does:
    def _generate_root_texts(self, revs):
...
        if self._rich_root_upgrade():
            import bzrlib.fetch
            return bzrlib.fetch.Inter1and2Helper(
                self.from_repository).generate_root_texts(revs)

And inside of Inter1and2Helper there are things like:
        while revs:
            for tree in self.source.revision_trees(revs[:100]):
                if tree.inventory.revision_id is None:
                    tree.inventory.revision_id = tree.get_revision_id()
                yield tree


Inter1and2Helper extracts *every* revision tree an extra time, and thus
extracts the Inventory information for every revision, which is already
the #1 bottleneck for conversions.

...

This is what I explicitly avoided in InterDifferingSerializer. Basically
by building up the root map while iterating the revision trees the first
time.


> I think the key one is being able to stop and restart. While you say its
> important, I don't think its at all obvious to users how to do this: bzr
> branch old new preserves format unless they init a repo first; if they
> have done this, hitting ctrl-C will abort with no indicator that it is
> partial.

Making it 2x slower because of Inter1and2Helper isn't particularly
advantageous either.

> 
> And our builtin 'upgrade' command is definitely not incremental.
> 
> Anyhow, I'll leave it alone at least until we have the deltas and other
> stuff in place; I'll bring up the issue again later.
> 
> -Rob

Sure. I honestly would like to have 1 code path. Just the "stream" code
is very inefficient for a local operation. It currently passes over the
inventory 2 times, (3 times with rich-root conversion), deals in full
inventories rather than deltas, etc. I'm not entirely sure that the
'optimal delta' generation in IDS is suitable for a Stream code. I
suppose it would make the stream smaller over the wire as well...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkn/sxcACgkQJdeBCYSNAANI1QCbB6iVPZPeKLVabIzwFbxkPTR1
2pUAnjKv9vUPJdmwymSVvamEexFoJZw2
=bP9u
-----END PGP SIGNATURE-----



More information about the bazaar mailing list